SlideShare a Scribd company logo
1 of 28
Download to read offline
Faculty of Degree Engineering
Department of CE/IT (07/
Enrollment no: _____________________
B.E. – SEMESTER –
Subject Code: 2170714
Subject Name: Distributed DBMS
Duration: 2:30 hr.
Instruction:
1. Attempt all questions.
2. Make suitable assumption where necessary.
3. Figure to the right indicate full Marks.
Q. 1 (A) Explain the potential problems with DDBMS
A:1 (A) 1.6.1 Distributed Database D
The question that is being addressed is how the database and the applications that run
against it should be placed across the sites. There are two basic alternatives to placing
data: partitioned (or non-
database is divided into a number of disjoint partitions each of which is placed at
a different site. Replicated designs can be either
duplicated) where the entire database is stored at each site, or
partially duplicated) where each partition of the database is stored at more than one
site, but not at all the sites. The two fundamental design issues are
the separation of the database into partitions called
Optimum distribution of fragments.
The research in this area mostly involves mathematical programming in order
to minimize the combined cost of storing the database, processing transactions
Against it, and message communication among site
Therefore, the proposed solutions are based on heuristics.
1.6.2 Distributed Directory Management
A directory contains information (such as descriptions and locations) about data
items in the database. Problems related t
to the database placement problem discussed in the preceding section. A directory
may be global to the entire DDBS or local to each site; it can be centralized at one
site or distributed over several sites; ther
1.6.3 Distributed Query Processing
Query processing deals with designing algorithms that analyze queries and convert
them into a series of data manipulation operations. The problem is how to decide
on a strategy for executing each query over the network in the most cost
way, however cost is defined. The factors to be considered are the distribution of
data, communication costs, and lack of sufficient locally
objective is to optimize where the inherent parallelism is used to improve the performance
of executing the transaction, subject to the above
problem is NP-hard in nature, and the approaches are usually heuristic.
1.6.4 Distributed Concurrency Con
Concurrency control involves the synchronization of accesses to the distributed database,
such that the integrity of the database is maintained. It is, without any doubt,
one of the most extensively studied problems in the DDBS field. The concurrency
control problem in a distributed context is somewhat different than in a centralized
framework. One not only has to worry about the integrity of a single database, but
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Enrollment no: _____________________ Seat No: _____________________
VII PRE-FINAL EXAMINATION –
Date: 17
Time:
Total Marks:
Make suitable assumption where necessary.
Figure to the right indicate full Marks.
Explain the potential problems with DDBMS.
1.6.1 Distributed Database Design
The question that is being addressed is how the database and the applications that run
against it should be placed across the sites. There are two basic alternatives to placing
-replicated) and replicated. In the partitioned scheme the
database is divided into a number of disjoint partitions each of which is placed at
a different site. Replicated designs can be either fully replicated (also called
) where the entire database is stored at each site, or partially replicated
) where each partition of the database is stored at more than one
site, but not at all the sites. The two fundamental design issues are fragmentation
the separation of the database into partitions called fragments, and distribution
Optimum distribution of fragments.
The research in this area mostly involves mathematical programming in order
to minimize the combined cost of storing the database, processing transactions
Against it, and message communication among sites. The general problem is NP
Therefore, the proposed solutions are based on heuristics.
1.6.2 Distributed Directory Management
A directory contains information (such as descriptions and locations) about data
items in the database. Problems related to directory management are similar in nature
to the database placement problem discussed in the preceding section. A directory
may be global to the entire DDBS or local to each site; it can be centralized at one
site or distributed over several sites; there can be a single copy or multiple copies.
1.6.3 Distributed Query Processing
Query processing deals with designing algorithms that analyze queries and convert
them into a series of data manipulation operations. The problem is how to decide
for executing each query over the network in the most cost-
way, however cost is defined. The factors to be considered are the distribution of
data, communication costs, and lack of sufficient locally-available information. The
imize where the inherent parallelism is used to improve the performance
of executing the transaction, subject to the above-mentioned constraints. The
hard in nature, and the approaches are usually heuristic.
1.6.4 Distributed Concurrency Control
Concurrency control involves the synchronization of accesses to the distributed database,
such that the integrity of the database is maintained. It is, without any doubt,
one of the most extensively studied problems in the DDBS field. The concurrency
control problem in a distributed context is somewhat different than in a centralized
framework. One not only has to worry about the integrity of a single database, but
083
Seat No: _____________________
– 2016 (ODD)
17-10-2016
10:00 AM To 12:30 PM
Total Marks: 70
07
The question that is being addressed is how the database and the applications that run
against it should be placed across the sites. There are two basic alternatives to placing
scheme the
database is divided into a number of disjoint partitions each of which is placed at
(also called fully
replicated (or
) where each partition of the database is stored at more than one
fragmentation,
distribution, the
The research in this area mostly involves mathematical programming in order
to minimize the combined cost of storing the database, processing transactions
s. The general problem is NP-hard.
A directory contains information (such as descriptions and locations) about data
o directory management are similar in nature
to the database placement problem discussed in the preceding section. A directory
may be global to the entire DDBS or local to each site; it can be centralized at one
e can be a single copy or multiple copies.
Query processing deals with designing algorithms that analyze queries and convert
them into a series of data manipulation operations. The problem is how to decide
-effective
way, however cost is defined. The factors to be considered are the distribution of
available information. The
imize where the inherent parallelism is used to improve the performance
mentioned constraints. The
Concurrency control involves the synchronization of accesses to the distributed database,
such that the integrity of the database is maintained. It is, without any doubt,
one of the most extensively studied problems in the DDBS field. The concurrency
control problem in a distributed context is somewhat different than in a centralized
framework. One not only has to worry about the integrity of a single database, but
Faculty of Degree Engineering
Department of CE/IT (07/
also about the consistency of multiple copies of the database. The condition that
requires all the values of multiple copies of every data item to converge to the same
value is called mutual consistency
1.6.5 Distributed Deadlock Management
The deadlock problem in DDBSs is similar in nature to that encountered in operating
systems. The competition among users for access to a set of resources (data, in this
case) can result in a deadlock if the synchronization mechanism is based on locking.
The well-known alternatives of prevention, avoidance, and detection/recovery also
apply to DDBSs.
1.6.6 Reliability of Distributed DBMS
We mentioned earlier that one of the potential advantages of distributed systems
is improved reliability and availability. This, however, is not a feature that comes
automatically. It is important that mechanisms be provid
of the database as well as to detect failures and recover from them. The implication
for DDBSs is that when a failure occurs and various sites become either inoperable
or inaccessible, the databases at the operational sites rem
Furthermore, when the computer system or network recovers from the failure, the
DDBSs should be able to recover and bring the databases at the failed sites up
This may be especially difficult in the case of network p
are divided into two or more groups with no communication among them.
1.6.7 Replication
If the distributed database is (partially or fully) replicated, it is necessary to implement
protocols that ensure the consistency of the
have the same value. These protocols can be
to be applied to all the replicas before the transaction completes, or they may be
lazy so that the transaction updates one copy (c
are propagated to the others after the transaction completes.
1.6.8 Relationship among Problem
The design of
distributed databases affects many areas. It affects directory management, because the
definition of fragments and their placement determine the contents of the directory
(or directories) as well as the strategies that may be employed to manage them.
The same information (i.e., fragment structure and placement) is used by the query
processor to determine the quer
and usage patterns that are determined by the query processor are used as inputs to
the data distribution and fragmentation algorithms. Similarly, directory placement
and contents influence the processin
There is a strong relationship among the concurrency control problem, the deadlock
management problem, and reliability issues. This is to be expected, since together
they are usually called the
control algorithm that is employed will determine whether or not a separate deadlock
management facility is required. If a locking
occur, whereas they will not if time stamping is the chosen alternative.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
also about the consistency of multiple copies of the database. The condition that
es all the values of multiple copies of every data item to converge to the same
mutual consistency.
1.6.5 Distributed Deadlock Management
The deadlock problem in DDBSs is similar in nature to that encountered in operating
etition among users for access to a set of resources (data, in this
case) can result in a deadlock if the synchronization mechanism is based on locking.
known alternatives of prevention, avoidance, and detection/recovery also
6 Reliability of Distributed DBMS
We mentioned earlier that one of the potential advantages of distributed systems
is improved reliability and availability. This, however, is not a feature that comes
automatically. It is important that mechanisms be provided to ensure the consistency
of the database as well as to detect failures and recover from them. The implication
for DDBSs is that when a failure occurs and various sites become either inoperable
or inaccessible, the databases at the operational sites remain consistent and up to date.
Furthermore, when the computer system or network recovers from the failure, the
DDBSs should be able to recover and bring the databases at the failed sites up
This may be especially difficult in the case of network partitioning, where the sites
are divided into two or more groups with no communication among them.
If the distributed database is (partially or fully) replicated, it is necessary to implement
protocols that ensure the consistency of the replicas,i.e., copies of the same data item
have the same value. These protocols can be eager in that they force the updates
to be applied to all the replicas before the transaction completes, or they may be
so that the transaction updates one copy (called the master) from which updates
are propagated to the others after the transaction completes.
1.6.8 Relationship among Problem
distributed databases affects many areas. It affects directory management, because the
s and their placement determine the contents of the directory
(or directories) as well as the strategies that may be employed to manage them.
The same information (i.e., fragment structure and placement) is used by the query
processor to determine the query evaluation strategy. On the other hand, the access
and usage patterns that are determined by the query processor are used as inputs to
the data distribution and fragmentation algorithms. Similarly, directory placement
and contents influence the processing of queries.
There is a strong relationship among the concurrency control problem, the deadlock
management problem, and reliability issues. This is to be expected, since together
they are usually called the transaction management problem. The concurrency
control algorithm that is employed will determine whether or not a separate deadlock
management facility is required. If a locking-based algorithm is used, deadlocks will
occur, whereas they will not if time stamping is the chosen alternative.
083
also about the consistency of multiple copies of the database. The condition that
es all the values of multiple copies of every data item to converge to the same
The deadlock problem in DDBSs is similar in nature to that encountered in operating
etition among users for access to a set of resources (data, in this
case) can result in a deadlock if the synchronization mechanism is based on locking.
known alternatives of prevention, avoidance, and detection/recovery also
We mentioned earlier that one of the potential advantages of distributed systems
is improved reliability and availability. This, however, is not a feature that comes
ed to ensure the consistency
of the database as well as to detect failures and recover from them. The implication
for DDBSs is that when a failure occurs and various sites become either inoperable
ain consistent and up to date.
Furthermore, when the computer system or network recovers from the failure, the
DDBSs should be able to recover and bring the databases at the failed sites up-to-date.
artitioning, where the sites
are divided into two or more groups with no communication among them.
If the distributed database is (partially or fully) replicated, it is necessary to implement
replicas,i.e., copies of the same data item
in that they force the updates
to be applied to all the replicas before the transaction completes, or they may be
) from which updates
distributed databases affects many areas. It affects directory management, because the
s and their placement determine the contents of the directory
(or directories) as well as the strategies that may be employed to manage them.
The same information (i.e., fragment structure and placement) is used by the query
y evaluation strategy. On the other hand, the access
and usage patterns that are determined by the query processor are used as inputs to
the data distribution and fragmentation algorithms. Similarly, directory placement
There is a strong relationship among the concurrency control problem, the deadlock
management problem, and reliability issues. This is to be expected, since together
problem. The concurrency
control algorithm that is employed will determine whether or not a separate deadlock
based algorithm is used, deadlocks will
Faculty of Degree Engineering
Department of CE/IT (07/
Q (B) Explain RAID Level Recovery Technique.
A (B) RAID or Redundant A
secondary storage devices and use them as a single storage media.RAID consists of an
array of disks in which multiple disks
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receive
to write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Explain RAID Level Recovery Technique.
Array of Independent Disks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.RAID consists of an
array of disks in which multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receive
to write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
083
07
isks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.RAID consists of an
are connected together to achieve different
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data
to write/read in parallel. It enhances the speed and performance of the storage device.
Faculty of Degree Engineering
Department of CE/IT (07/
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID 2
• RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It tak
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
• In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
• Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It takes data striping to the
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
083
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
to all the disks in the array. RAID level 1 is also called mirroring and
RAID Level 2 uses concept of parallel access technique. It works on the
es data striping to the
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
In RAID level 2, strip are very small, so when a block is read, all disks are
Hamming code generation is time consuming, therefore RAID level 2 is too
Faculty of Degree Engineering
Department of CE/IT (07/
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte
whereas level 4 uses block
disks to implement RAID.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
t disk. This technique makes it to overcome single disk failures.
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte
s level 4 uses block-level striping. Both level 3 and level 4 require at least three
disks to implement RAID.
083
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
t disk. This technique makes it to overcome single disk failures.
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping,
level striping. Both level 3 and level 4 require at least three
Faculty of Degree Engineering
Department of CE/IT (07/
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the
different dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to implement RAID.
Q. 2 (A) What is Concurrency? List out method of Concurrency control and Explain any one of
them.
A. 2 (A) Concurrency: In computer science,
instruction sequences at the same time.
• In distributed database system, database is typically used by many users. These
system usually allow multiple transaction to run concurrently at the same time.
• It must support parallel execution of tran
• Communication delay is less.
• It must be recovery from site and communication failure.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a
different dedicated disk.
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
lt tolerance. This level requires at least four disk drives to implement RAID.
What is Concurrency? List out method of Concurrency control and Explain any one of
In computer science, concurrency is the execution of several
instruction sequences at the same time.
In distributed database system, database is typically used by many users. These
system usually allow multiple transaction to run concurrently at the same time.
It must support parallel execution of transaction.
Communication delay is less.
It must be recovery from site and communication failure.
083
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data disks rather than storing them on a
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
lt tolerance. This level requires at least four disk drives to implement RAID.
What is Concurrency? List out method of Concurrency control and Explain any one of 07
n of several
In distributed database system, database is typically used by many users. These
system usually allow multiple transaction to run concurrently at the same time.
Faculty of Degree Engineering
Department of CE/IT (07/
Locking-Based Concurrency Control
• The main idea of locking
item that is shared by conflicting operations is accessed by one operation at a
time.
• This lock is set by a transaction before it is accessed and is reset at the end of
its use.
• There are two types of locks read lock (rl) and write lock (wl)
Locking-Based Concurrency Control Algorithms
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Methods of concurrency control
Based Concurrency Control
The main idea of locking-based concurrency control is to ensure that a dat
item that is shared by conflicting operations is accessed by one operation at a
This lock is set by a transaction before it is accessed and is reset at the end of
There are two types of locks read lock (rl) and write lock (wl)
ed Concurrency Control Algorithms
083
based concurrency control is to ensure that a data
item that is shared by conflicting operations is accessed by one operation at a
This lock is set by a transaction before it is accessed and is reset at the end of
There are two types of locks read lock (rl) and write lock (wl)
Faculty of Degree Engineering
Department of CE/IT (07/
2PL Lock Graph
Q (B) Explain layers of query processing.
A (B) Query Decomposition
• The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this
conceptual schema describing the global relations.
• Query decomposition can be viewed as four successive steps.
• First, the calculus query is rewritten in a normalized form that is suitable for
subsequent manipulation.
manipulation of the query quantifiers and of the query qualification by applying
logical operator priority.
• Second, the normalized query is analyzed semantically so that incorrect queries
are detected and rej
queries exist only for a subset of relational calculus. Typically, they use some
sort of graph that captures the semantics of the query.
• Third, the correct query (still expressed in relational cal
One way to simplify a query is to eliminate redundant predicates. Note that
redundant queries are likely to arise when a query is the result of system
transformations applied to the user query. such transformations are used for
performing semantic data control (views, protection, and semantic integrity
control).
• Fourth, the calculus query is restructured as an algebraic query. The traditional
way to do this transformation toward a “better” algebraic specification is to
start with an initial algebraic query and transform it in order to find a “go
• The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided.
Data Localization
• The input to the second layer is an algebraic query on globa
main role of the second layer is to localize the query’s data using data
distribution information in the fragment schema.
• This layer determines which fragments are involved in the query and
transforms the distributed query into a query on
• A global relation can be reconstructed by applying the fragmentation rules, and
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Explain layers of query processing.
Query Decomposition
The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this transformation is found in the global
conceptual schema describing the global relations.
Query decomposition can be viewed as four successive steps.
, the calculus query is rewritten in a normalized form that is suitable for
subsequent manipulation. Normalization of a query generally involves the
manipulation of the query quantifiers and of the query qualification by applying
logical operator priority.
, the normalized query is analyzed semantically so that incorrect queries
are detected and rejected as early as possible. Techniques to detect incorrect
queries exist only for a subset of relational calculus. Typically, they use some
sort of graph that captures the semantics of the query.
, the correct query (still expressed in relational calculus) is simplified.
One way to simplify a query is to eliminate redundant predicates. Note that
redundant queries are likely to arise when a query is the result of system
transformations applied to the user query. such transformations are used for
rming semantic data control (views, protection, and semantic integrity
, the calculus query is restructured as an algebraic query. The traditional
way to do this transformation toward a “better” algebraic specification is to
nitial algebraic query and transform it in order to find a “go
The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided.
The input to the second layer is an algebraic query on globa
main role of the second layer is to localize the query’s data using data
distribution information in the fragment schema.
This layer determines which fragments are involved in the query and
transforms the distributed query into a query on fragments.
A global relation can be reconstructed by applying the fragmentation rules, and
083
07
The first layer decomposes the calculus query into an algebraic query on global
transformation is found in the global
Query decomposition can be viewed as four successive steps.
, the calculus query is rewritten in a normalized form that is suitable for
Normalization of a query generally involves the
manipulation of the query quantifiers and of the query qualification by applying
, the normalized query is analyzed semantically so that incorrect queries
ected as early as possible. Techniques to detect incorrect
queries exist only for a subset of relational calculus. Typically, they use some
culus) is simplified.
One way to simplify a query is to eliminate redundant predicates. Note that
redundant queries are likely to arise when a query is the result of system
transformations applied to the user query. such transformations are used for
rming semantic data control (views, protection, and semantic integrity
, the calculus query is restructured as an algebraic query. The traditional
way to do this transformation toward a “better” algebraic specification is to
nitial algebraic query and transform it in order to find a “go
The algebraic query generated by this layer is good in the sense that the worse
The input to the second layer is an algebraic query on global relations. The
main role of the second layer is to localize the query’s data using data
This layer determines which fragments are involved in the query and
A global relation can be reconstructed by applying the fragmentation rules, and
Faculty of Degree Engineering
Department of CE/IT (07/
then deriving a program, called a localization program, of relational algebra
operators, which then act on fragments.
• Generating a query on fragments is done in two
• First, the query is mapped into a fragment query by substituting each
relation by its reconstruction program (also called materialization
program).
• Second
another “good” query.
Global Query Optimization
• The input to the third layer is an algebraic query on fragments. The goal of
query optimization is to find an execution strategy for the query which is close
to optimal.
• The previous layers have already optimized the quer
eliminating redundant expressions. However, this optimization is independent
of fragment characteristics such as fragment allocation and cardinalities.
• Query optimization consists of finding the “best” ordering of operators in
the query, including communication operators that minimize a cost
function.
• The output of the query optimization layer is a optimized algebraic query with
communication operators included on fragments. It is typically represented and
saved (for future executions) as
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
then deriving a program, called a localization program, of relational algebra
operators, which then act on fragments.
Generating a query on fragments is done in two steps
, the query is mapped into a fragment query by substituting each
relation by its reconstruction program (also called materialization
program).
Second, the fragment query is simplified and restructured to produce
another “good” query.
Global Query Optimization
The input to the third layer is an algebraic query on fragments. The goal of
query optimization is to find an execution strategy for the query which is close
The previous layers have already optimized the query, for example, by
eliminating redundant expressions. However, this optimization is independent
of fragment characteristics such as fragment allocation and cardinalities.
Query optimization consists of finding the “best” ordering of operators in
including communication operators that minimize a cost
The output of the query optimization layer is a optimized algebraic query with
communication operators included on fragments. It is typically represented and
saved (for future executions) as a distributed query execution
083
then deriving a program, called a localization program, of relational algebra
, the query is mapped into a fragment query by substituting each
relation by its reconstruction program (also called materialization
, the fragment query is simplified and restructured to produce
The input to the third layer is an algebraic query on fragments. The goal of
query optimization is to find an execution strategy for the query which is close
y, for example, by
eliminating redundant expressions. However, this optimization is independent
of fragment characteristics such as fragment allocation and cardinalities.
Query optimization consists of finding the “best” ordering of operators in
including communication operators that minimize a cost
The output of the query optimization layer is a optimized algebraic query with
communication operators included on fragments. It is typically represented and
a distributed query execution plan.
Faculty of Degree Engineering
Department of CE/IT (07/
Q (B) Explain RAID Level Recovery Technique.
A (B) RAID or Redundant A
secondary storage devices and use them as a single storage media.RAID consists of an
array of disks in which multiple disks
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receive
to write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Explain RAID Level Recovery Technique.
Array of Independent Disks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.RAID consists of an
array of disks in which multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receive
to write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
083
07
isks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.RAID consists of an
are connected together to achieve different
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data
to write/read in parallel. It enhances the speed and performance of the storage device.
Faculty of Degree Engineering
Department of CE/IT (07/
Minterm Properties
• Minterm selectivity
• Number of records that satisfy minterm
• sel(m1) = 1; sel(m2) = 1; sel(m4) = 0
• Access frequency by applications and users
• Q = {q1, q2, …, qq} is set of queries
• acc(q1) is freque
Algorithm for Determining Minterms
Rule 1: fragment is partitioned into at least two parts that
by at least one application
Definitions
R - relation
Pr - set of simple predicates
Pr' - another set of simple predicates
F - set of minterm fragments
COM_MIN (R, Pr)
{ // Compute minterms
Rule 1
Pr' = pi
Pr = Pr - pi
F = fi // fi is minterm fragment according to p
while (Pr' is incomplete)
{
find pj Pr that partitions some f
Pr' = Pr' U pj
Pr = Pr - pj
F = F U fj // fj is minterm fragment according to p
if pk Pr' which is
Pr' = Pr' - pk
F = F - fk
}
}
return Pr'
}
Q.3 (A) Explain following in context of Relational algebra :
1. Selection 2. Natural Join 3.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Minterm selectivity
Number of records that satisfy minterm
sel(m1) = 1; sel(m2) = 1; sel(m4) = 0
Access frequency by applications and users
Q = {q1, q2, …, qq} is set of queries
acc(q1) is frequency of access of query 1
Algorithm for Determining Minterms
Rule 1: fragment is partitioned into at least two parts that are accessed differently
by at least one application
set of simple predicates
simple predicates
set of minterm fragments
COM_MIN (R, Pr)
{ // Compute minterms // find a pi such that pi partitions R according to
is minterm fragment according to pi
while (Pr' is incomplete)
Pr that partitions some fk of Pr'
is minterm fragment according to pj
Pr' which is non-relevant { // this is complex
following in context of Relational algebra :
1. Selection 2. Natural Join 3. Projection
083
are accessed differently
partitions R according to
07
Faculty of Degree Engineering
Department of CE/IT (07/
A.3 (A) 1. Selection
• Produces a horizontal subset of the operand relation
• General form
σF(R)={
where
• R is a relation,
• F is a formula consisting of
•
•
•
Selection Example
2. Natural Join
• Equi-join of two re
both R and S and projecting out one copy of those attributes
• R ⋈ S = ΠR∪Sσ
Natural Join Example
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Produces a horizontal subset of the operand relation
)={t |t∈R and F(t) is true}
is a relation, t is a tuple variable
is a formula consisting of
operands that are constants or attributes
arithmetic comparison operators
<, >, =, ≠, ≤, ≥
logical operators
∧, ∨, ¬
join of two relations R and S over an attribute (or attributes) common to
and projecting out one copy of those attributes
σF(R × S)
Natural Join Example
083
over an attribute (or attributes) common to
and projecting out one copy of those attributes
Faculty of Degree Engineering
Department of CE/IT (07/
3. Projection
• Produces a vertical slice of a relation
• General form
where
• R is a relation,
• {A1,…,
will be performed
• Note: projection can generate duplicate tuples. Commercial systems (and SQL)
allow this and provide
• Projection with duplicate elimination
• Projection without duplicate elimination
Projection Example
Q (B) Explain Client Server architecture for Distributed DBMS.
A (B) • This provides two-
complexity of modern DBMSs and the complexity of distribution.
• The server does most of the data management work (query processing an
optimization, transaction management, storage management).
• The client is the application and the user interface (management the data that is
cached to the client, management the transaction locks).
• Multiple client - single server
From a data management perspective, this is not much different from
centralized databases since the database is stor
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Produces a vertical slice of a relation
ΠA1,…,An(R)={t[A1,…, An] | t∈R}
is a relation, t is a tuple variable
,…, An} is a subset of the attributes of R over which the
will be performed
Note: projection can generate duplicate tuples. Commercial systems (and SQL)
allow this and provide
Projection with duplicate elimination
Projection without duplicate elimination
erver architecture for Distributed DBMS.
-level architecture which make it easier to manage the
complexity of modern DBMSs and the complexity of distribution.
does most of the data management work (query processing an
optimization, transaction management, storage management).
is the application and the user interface (management the data that is
cached to the client, management the transaction locks).
• This architecture is quite common in
relational systems where the communication
between the clients and the server(s) is at the
level of SQL statements.
single server
From a data management perspective, this is not much different from
centralized databases since the database is stored on only one machine (the server)
083
over which the projection
Note: projection can generate duplicate tuples. Commercial systems (and SQL)
07
which make it easier to manage the
complexity of modern DBMSs and the complexity of distribution.
does most of the data management work (query processing and
is the application and the user interface (management the data that is
This architecture is quite common in
ems where the communication
between the clients and the server(s) is at the
level of SQL statements.
From a data management perspective, this is not much different from
ed on only one machine (the server)
Faculty of Degree Engineering
Department of CE/IT (07/
which also hosts the software to manage it. However, there are some differences from
centralized systems in the way transactions are executed and caches are managed.
• Multiple client - multiple server
In this case, two
client manages its own connection to the appropriate server or each client knows of
only its “home server” which then communicates with other servers as required.
Q. 3 (A) Explain ACID property in concept of DDBMS.
A. 3 (A) • The consistency and reliability aspects of transactions are due to four properties
1. Atomicity
2. Consistency
3. Isolation
4. Durability
• Together, these are commonly referred to as the ACID properties of transactions.
1. Atomicity
• Atomicity refers to the fact that a transaction is treated as a unit of operation.
Therefore, either all the transaction’s actions are completed, or none of them
are. This is also known as the “all
2. Consistency
• The consistency of a t
• In other words, a transaction is a correct program that maps one consistent
database state to another.
3. Isolation
• Isolation is the property of transactions that requires each transaction to see a
consistent database at
• In other words, an executing transaction cannot reveal its results to other
concurrent transactions before its commitment.
4. Durability
• Durability refers to that property of transactions which ensures that once a
transaction commits, its result
database.
Q (B) Explain AA matrix, CA matrix and BEA algorithm in context of vertical
fragmentation.
A (B) Determining Affinity
• The attribute use matrix does not help us yet
• We cannot determine the affin
• because we don't know the access frequency of the
• attribute groups
• We need this to calculate attribute affinity
• Defines how often Ai and Aj are accessed together
• Depends on the frequency of query requests for
simultaneously.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
which also hosts the software to manage it. However, there are some differences from
centralized systems in the way transactions are executed and caches are managed.
multiple server
In this case, two alternative management strategies are possible: either each
client manages its own connection to the appropriate server or each client knows of
only its “home server” which then communicates with other servers as required.
OR
rty in concept of DDBMS.
The consistency and reliability aspects of transactions are due to four properties
Together, these are commonly referred to as the ACID properties of transactions.
Atomicity refers to the fact that a transaction is treated as a unit of operation.
Therefore, either all the transaction’s actions are completed, or none of them
are. This is also known as the “all-or-nothing property.”
The consistency of a transaction is simply its correctness.
In other words, a transaction is a correct program that maps one consistent
database state to another.
Isolation is the property of transactions that requires each transaction to see a
consistent database at all times.
In other words, an executing transaction cannot reveal its results to other
concurrent transactions before its commitment.
Durability refers to that property of transactions which ensures that once a
transaction commits, its results are permanent and cannot be erased from the
Explain AA matrix, CA matrix and BEA algorithm in context of vertical
Determining Affinity
The attribute use matrix does not help us yet
We cannot determine the affinity of the attributes
because we don't know the access frequency of the
attribute groups
ed this to calculate attribute affinity - aff(Ai, Aj)
Defines how often Ai and Aj are accessed together
Depends on the frequency of query requests for attributes Ai and Aj
simultaneously.
083
which also hosts the software to manage it. However, there are some differences from
centralized systems in the way transactions are executed and caches are managed.
alternative management strategies are possible: either each
client manages its own connection to the appropriate server or each client knows of
only its “home server” which then communicates with other servers as required.
07
The consistency and reliability aspects of transactions are due to four properties
Together, these are commonly referred to as the ACID properties of transactions.
Atomicity refers to the fact that a transaction is treated as a unit of operation.
Therefore, either all the transaction’s actions are completed, or none of them
In other words, a transaction is a correct program that maps one consistent
Isolation is the property of transactions that requires each transaction to see a
In other words, an executing transaction cannot reveal its results to other
Durability refers to that property of transactions which ensures that once a
s are permanent and cannot be erased from the
Explain AA matrix, CA matrix and BEA algorithm in context of vertical 07
Ai and Aj
Faculty of Degree Engineering
Department of CE/IT (07/
Affinity Measure
• The attribute affinity between two attributes Ai and Aj of a relation
• where refm(qk) is the number of access to attributes Ai and Aj for each
application qk at site Sm
• where accl(qk) is the access frequency of
Attribute Affinity Matrix
Clustering Algorithm
• Want to find which attributes belong together in a vertically fragmented table
• Examining it for this small case is sufficient
Bond Energy Algorithm (BEA)
• Initialize - pick
• Iterate
Select the next column and try to place it in the matrix
Choose the place that maximizes the global affinity
Repeat
• Repeat the process for rows.
However, since AA is symmetric, jus
We will reorder the columns later to create a symmetric matrix
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
The attribute affinity between two attributes Ai and Aj of a relation
where refm(qk) is the number of access to attributes Ai and Aj for each
application qk at site Sm
where accl(qk) is the access frequency of qk for each site
Attribute Affinity Matrix – Example
Clustering Algorithm
Want to find which attributes belong together in a vertically fragmented table
Examining it for this small case is sufficient
Bond Energy Algorithm (BEA)
pick any column
Select the next column and try to place it in the matrix
Choose the place that maximizes the global affinity
Repeat the process for rows.
However, since AA is symmetric, just reorder the rows!
We will reorder the columns later to create a symmetric matrix
083
The attribute affinity between two attributes Ai and Aj of a relation
where refm(qk) is the number of access to attributes Ai and Aj for each
Want to find which attributes belong together in a vertically fragmented table
We will reorder the columns later to create a symmetric matrix
Faculty of Degree Engineering
Department of CE/IT (07/
BEA Pseudocode
Bond Energy Algorithm (BEA) {
input: AA // attribute affinity matrix
output: CA // clustered attribute matrix
// put the first two columns in
CA(*, 1) <- AA(*, 1)
CA(*, 2) <- AA(*, 2)
// for each of the rest of the columns of AA,
// choose the best placement.
while (index <= n ) {
// calculate continuity of each possible place for new column
for (i=1; i< index; i++) // iterate over the columns
calculate cont(Ai-1, Aindex, Ai);
calculate cont(Aindex-
loc <- placement given by maximum cont()
for (j=index; j>loc; j--
CA(*,j) <- CA (*, j-1);
CA(*, loc) = AA(*, index);
index <- index + 1;
}
// reorder the rows according to the placement of columns
}
Q.4 (A) Explain Two phase commit protocol.
A:4 (A) Two-phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
commitment of distributed transactions. It extends t
distributed transactions by insisting that all sites involved in the execution of a distributed
transaction agree to commit the transaction before its effects are made permanent. There are
a number of reasons wh
the type of concurrency control algorithm that is used, some schedulers may not be ready to
terminate a transaction. For example, if a transaction has read a value of a data item that is
updated by another transaction that has not yet committed, the associated scheduler may not
want to commit the former. Of course, strict concurrency control algorithms that avoid
cascading aborts would not permit the updated value of a data item to be read
transaction until the updating transaction terminates. This is sometimes called the
recoverability condition.
A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the
coordinator writes a begin com
sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if
it could commit the transaction. If so, the participant writes a
“vote-commit” message to the coordinator, and enters READY state; otherwise, the participant
writes an abort record and sends a “vote
the site is to abort, it can forget about that transaction, si
(i.e., unilateral abort). After the coordinator has received a reply from every participant, it
decides whether to commit or to abort the transaction. If even one participant has registered a
negative vote, the coordi
sends a “global-abort” message to all participant sites, and enters the ABORT state; otherwise,
it writes a commit record, sends a “global
COMMIT state. The participants either commit or abort the transaction according to the
coordinator’s instructions and send back an acknowledgment, at which point the coordinator
terminates the transaction by writing an
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Bond Energy Algorithm (BEA) {
input: AA // attribute affinity matrix
output: CA // clustered attribute matrix
// put the first two columns in
// for each of the rest of the columns of AA,
// choose the best placement.
// calculate continuity of each possible place for new column
for (i=1; i< index; i++) // iterate over the columns
1, Aindex, Ai);
-1, Aindex, Aindex+1); // boundary
placement given by maximum cont()
--) // iterate over the columns
1);
CA(*, loc) = AA(*, index);
// reorder the rows according to the placement of columns
Explain Two phase commit protocol.
phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
commitment of distributed transactions. It extends the effects of local atomic commit actions to
distributed transactions by insisting that all sites involved in the execution of a distributed
transaction agree to commit the transaction before its effects are made permanent. There are
a number of reasons why such synchronization among sites is necessary. First, depending on
the type of concurrency control algorithm that is used, some schedulers may not be ready to
terminate a transaction. For example, if a transaction has read a value of a data item that is
updated by another transaction that has not yet committed, the associated scheduler may not
want to commit the former. Of course, strict concurrency control algorithms that avoid
cascading aborts would not permit the updated value of a data item to be read
transaction until the updating transaction terminates. This is sometimes called the
A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the
begin commit record in its log, sends a “prepare” message to all participant
sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if
it could commit the transaction. If so, the participant writes a ready record in the log, se
commit” message to the coordinator, and enters READY state; otherwise, the participant
record and sends a “vote-abort” message to the coordinator. If the decision of
the site is to abort, it can forget about that transaction, since an abort decision serves as a veto
(i.e., unilateral abort). After the coordinator has received a reply from every participant, it
decides whether to commit or to abort the transaction. If even one participant has registered a
negative vote, the coordinator has to abort the transaction globally. So it writes an
abort” message to all participant sites, and enters the ABORT state; otherwise,
record, sends a “global-commit” message to all participants, and e
COMMIT state. The participants either commit or abort the transaction according to the
coordinator’s instructions and send back an acknowledgment, at which point the coordinator
terminates the transaction by writing an end of transaction record in the log.
083
07
phase commit (2PC) is a very simple and elegant protocol that ensures the atomic
he effects of local atomic commit actions to
distributed transactions by insisting that all sites involved in the execution of a distributed
transaction agree to commit the transaction before its effects are made permanent. There are
y such synchronization among sites is necessary. First, depending on
the type of concurrency control algorithm that is used, some schedulers may not be ready to
terminate a transaction. For example, if a transaction has read a value of a data item that is
updated by another transaction that has not yet committed, the associated scheduler may not
want to commit the former. Of course, strict concurrency control algorithms that avoid
cascading aborts would not permit the updated value of a data item to be read by any other
transaction until the updating transaction terminates. This is sometimes called the
A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the
record in its log, sends a “prepare” message to all participant
sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if
record in the log, sends a
commit” message to the coordinator, and enters READY state; otherwise, the participant
abort” message to the coordinator. If the decision of
nce an abort decision serves as a veto
(i.e., unilateral abort). After the coordinator has received a reply from every participant, it
decides whether to commit or to abort the transaction. If even one participant has registered a
nator has to abort the transaction globally. So it writes an abort record,
abort” message to all participant sites, and enters the ABORT state; otherwise,
commit” message to all participants, and enters the
COMMIT state. The participants either commit or abort the transaction according to the
coordinator’s instructions and send back an acknowledgment, at which point the coordinator
Faculty of Degree Engineering
Department of CE/IT (07/
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
State Transitions in 2PC Protocol
083
Faculty of Degree Engineering
Department of CE/IT (07/
Q (B) Explain Top-down and Bottom
A (B) Top-Down Design Process
Conceptual design of the data is the ER model of the whole enterprise
Federated by the views
Must anticipate new views/usages
Must describe semantics of the data as used in the domain/enterprise
This is almost identical to typical DB design
However, we are concerned with Distribution Design
We need to place tables "geographic
We also need to fragment tables
Bottom Up
• Top-down design is the choice when you have the liberty of starting from scratch
Unfortunately, this is not usually the case
Some element of bottom
• Bottom-up design is integrating independent/semi
Global Conceptual
Must deal with schema mapping issues
May deal with heterogeneous integration issues
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
down and Bottom-up design strategies.
Down Design Process
Conceptual design of the data is the ER model of the whole enterprise
Federated by the views
Must anticipate new views/usages
Must describe semantics of the data as used in the domain/enterprise
This is almost identical to typical DB design
However, we are concerned with Distribution Design
We need to place tables "geographically" on the network
We also need to fragment tables
down design is the choice when you have the liberty of starting from scratch
Unfortunately, this is not usually the case
Some element of bottom-up design is more common
up design is integrating independent/semi-independent schemas into a
Global Conceptual Schema (GCS)
Must deal with schema mapping issues
May deal with heterogeneous integration issues
OR
083
07
Conceptual design of the data is the ER model of the whole enterprise
Must describe semantics of the data as used in the domain/enterprise
down design is the choice when you have the liberty of starting from scratch
independent schemas into a
Faculty of Degree Engineering
Department of CE/IT (07/
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID 2
• RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It tak
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
• In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
• Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
to all the disks in the array. RAID level 1 is also called
provides 100% redundancy in case of a failure.
RAID Level 2 uses concept of parallel access technique. It works on the
word(byte) level. So each strip stores one bit. It takes data striping to the
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
In RAID level 2, strip are very small, so when a block is read, all disk
accessed in parallel.
Hamming code generation is time consuming, therefore RAID level 2 is too
slow for most commercial application.
083
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
to all the disks in the array. RAID level 1 is also called mirroring and
RAID Level 2 uses concept of parallel access technique. It works on the
es data striping to the
extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this
reason, it require a minimum, of 8 surface to write data to the hard disk.
In RAID level 2, strip are very small, so when a block is read, all disks are
Hamming code generation is time consuming, therefore RAID level 2 is too
Faculty of Degree Engineering
Department of CE/IT (07/
(1) The Operation-based Appr
• Principle:
– Record all changes made to state of process (‘audit trail’ or ‘log’) such
that process can be returned to a previous state
– Example: A transaction based environment where transactions update a
database
•
•
(2) State-based Approach
• Principle: establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
state of process
• Actions:
– ‘Checkpointing’ or ‘taking a checkpoint’: saving process state
– ‘Rolling back’ a process: restoring a process to a prior state
Log Based Recovery
• When failures occur the following operation that use the log are executed.
• UNDO: restore database to stat
• REDO: perform the changes to the database over again.
UNDO
REDO
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
based Approach
Record all changes made to state of process (‘audit trail’ or ‘log’) such
that process can be returned to a previous state
Example: A transaction based environment where transactions update a
database
It is possible to commit or undo updates on a per
basis
A commit indicates that the transaction on the object was
successful and changes are permanent
based Approach
establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
state of process
‘Checkpointing’ or ‘taking a checkpoint’: saving process state
‘Rolling back’ a process: restoring a process to a prior state
When failures occur the following operation that use the log are executed.
UNDO: restore database to state prior to execution.
REDO: perform the changes to the database over again.
083
Record all changes made to state of process (‘audit trail’ or ‘log’) such
Example: A transaction based environment where transactions update a
es on a per-transaction
A commit indicates that the transaction on the object was
establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire
‘Checkpointing’ or ‘taking a checkpoint’: saving process state
‘Rolling back’ a process: restoring a process to a prior state
When failures occur the following operation that use the log are executed.
REDO: perform the changes to the database over again.
Faculty of Degree Engineering
Department of CE/IT (07/
Write-Ahead Logging Protocol
• write-ahead logging
durability in database systems.
• In a system using W
applied. Usually both redo and undo information is stored in the log.
• The purpose of this can be illustrated by an example. Imagine a program that is in
the middle of performing some operation when
power. Upon restart, that program might well need to know whether the operation
it was performing succeeded, half
used, the program can check this log and compare what it was
doing when it unexpectedly lost power to what was actually done. On the basis of
this comparison, the program could decide to undo what it had started, complete
what it had started, or keep things as they are.
Q (B) What is Semantic Integrity control? Explain in concept of Centralized and Distributed
environment.
A (B) Semantic Integrity Control
• Semantic integrity control ensures database consistency by rejecting update
transactions that lead to inconsistent database states, or b
the database state, which compensate for the effects of the update transactions.
• Two main types of integrity constraints can be distinguished: structural constraints and
behavioral constraints.
• Structural constraints expre
of such constraints are unique key constraints in the relational model, or one
associations between objects in the object
• Behavioral constraints are essential in the databas
associations between objects, such as inclusion dependency in the relational model, or
describe object properties and structures.
Centralized Semantic Integrity Control
• Specification of Integrity Constraints
• triggers (event-condition
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Ahead Logging Protocol
ahead logging (WAL) is a family of techniques for providing atomicity and
durability in database systems.
In a system using WAL, all modifications are written to a log before they are
applied. Usually both redo and undo information is stored in the log.
The purpose of this can be illustrated by an example. Imagine a program that is in
the middle of performing some operation when the machine it is running on loses
power. Upon restart, that program might well need to know whether the operation
it was performing succeeded, half-succeeded, or failed. If a write
used, the program can check this log and compare what it was
doing when it unexpectedly lost power to what was actually done. On the basis of
this comparison, the program could decide to undo what it had started, complete
what it had started, or keep things as they are.
ntegrity control? Explain in concept of Centralized and Distributed
Semantic Integrity Control
Semantic integrity control ensures database consistency by rejecting update
transactions that lead to inconsistent database states, or by activating specific actions on
the database state, which compensate for the effects of the update transactions.
Two main types of integrity constraints can be distinguished: structural constraints and
behavioral constraints.
Structural constraints express basic semantic properties inherent to a model. Examples
of such constraints are unique key constraints in the relational model, or one
associations between objects in the object-oriented model.
Behavioral constraints are essential in the database design process. They can express
associations between objects, such as inclusion dependency in the relational model, or
describe object properties and structures.
Centralized Semantic Integrity Control
Specification of Integrity Constraints
condition-action rules) can be used to automatically propagate updates,
083
) is a family of techniques for providing atomicity and
AL, all modifications are written to a log before they are
applied. Usually both redo and undo information is stored in the log.
The purpose of this can be illustrated by an example. Imagine a program that is in
the machine it is running on loses
power. Upon restart, that program might well need to know whether the operation
succeeded, or failed. If a write-ahead log is
supposed to be
doing when it unexpectedly lost power to what was actually done. On the basis of
this comparison, the program could decide to undo what it had started, complete
ntegrity control? Explain in concept of Centralized and Distributed 07
Semantic integrity control ensures database consistency by rejecting update
y activating specific actions on
the database state, which compensate for the effects of the update transactions.
Two main types of integrity constraints can be distinguished: structural constraints and
ss basic semantic properties inherent to a model. Examples
of such constraints are unique key constraints in the relational model, or one-to-many
e design process. They can express
associations between objects, such as inclusion dependency in the relational model, or
action rules) can be used to automatically propagate updates,
Faculty of Degree Engineering
Department of CE/IT (07/
and thus to maintain semantic integrity.
• We can distinguish between three types of integrity constraints:
precondition, or general constraints
• EMP(ENO, ENAME, TITLE)
• PROJ(PNO, PNAME, BUDGET)
• ASG(ENO, PNO, RESP, DUR)
• Predefined constraints
express concisely the more common constraints of the relational model, such as non
null attribute, unique key, foreign
• Employee number in relation EMP cannot be null.
ENO NOT NULL IN EMP
• The project number PNO in relation ASG is a foreign key matching the primary key
PNO of relation PROJ.
PNO IN ASG REFERENCES PNO IN PROJ
• Precondition const
relation for a given update type. The update type, which might be INSERT, DELETE,
or MODIFY, permits restricting the integrity control.
• Precondition constraints can be expressed with the
with the ability to specify the update type.
CHECK ON <relation name > WHEN <update type>
• The budget of a project is between 500K and 1000K.
CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 10
• Only the tuples whose budget is 0 may be deleted.
CHECK ON PROJ WHEN DELETE (BUDGET = 0)
• General constraints
quantified. The database system must ensure that those formulas are alway
CHECK ON list of <variable name>:<relation name>,(<qualification>)
• The total duration for all employees in the CAD project is less than 100.
• CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
j.PNAME="CAD/CAM")
Distributed Semantic Integr
• Definition of Distributed Integrity Constraints
• Assertions can involve data stored at different sites, the storage of the constraints must
be decided so as to minimize the cost of integrity checking. There is a strategy based
on a taxonomy of integrity constraints that distinguishes three classes:
• Individual constraints
tuples to be updated independently of the rest of the database.
• Set-oriented constraints
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
and thus to maintain semantic integrity.
We can distinguish between three types of integrity constraints: predefined
general constraints.
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET)
ASG(ENO, PNO, RESP, DUR)
Predefined constraints are based on simple keywords. Through them, it is possible to
express concisely the more common constraints of the relational model, such as non
null attribute, unique key, foreign key, or functional dependency.
Employee number in relation EMP cannot be null.
ENO NOT NULL IN EMP
The project number PNO in relation ASG is a foreign key matching the primary key
PNO of relation PROJ.
PNO IN ASG REFERENCES PNO IN PROJ
Precondition constraints express conditions that must be satisfied by all tuples in a
relation for a given update type. The update type, which might be INSERT, DELETE,
or MODIFY, permits restricting the integrity control.
Precondition constraints can be expressed with the SQL CHECK statement enriched
with the ability to specify the update type.
CHECK ON <relation name > WHEN <update type>
(<qualification over relation name>)
The budget of a project is between 500K and 1000K.
CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 1000000)
Only the tuples whose budget is 0 may be deleted.
CHECK ON PROJ WHEN DELETE (BUDGET = 0)
General constraints are formulas of tuple relational calculus where all variables are
quantified. The database system must ensure that those formulas are alway
CHECK ON list of <variable name>:<relation name>,(<qualification>)
The total duration for all employees in the CAD project is less than 100.
CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
j.PNAME="CAD/CAM")
Distributed Semantic Integrity Control
Definition of Distributed Integrity Constraints
Assertions can involve data stored at different sites, the storage of the constraints must
be decided so as to minimize the cost of integrity checking. There is a strategy based
integrity constraints that distinguishes three classes:
Individual constraints: single-relation single-variable constraints. They refer only to
tuples to be updated independently of the rest of the database.
oriented constraints: include single-relation multivariable constraints such as
083
predefined,
are based on simple keywords. Through them, it is possible to
express concisely the more common constraints of the relational model, such as non-
The project number PNO in relation ASG is a foreign key matching the primary key
express conditions that must be satisfied by all tuples in a
relation for a given update type. The update type, which might be INSERT, DELETE,
SQL CHECK statement enriched
CHECK ON <relation name > WHEN <update type>
00000)
are formulas of tuple relational calculus where all variables are
quantified. The database system must ensure that those formulas are always true.
CHECK ON list of <variable name>:<relation name>,(<qualification>)
The total duration for all employees in the CAD project is less than 100.
CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF
Assertions can involve data stored at different sites, the storage of the constraints must
be decided so as to minimize the cost of integrity checking. There is a strategy based
integrity constraints that distinguishes three classes:
variable constraints. They refer only to
n multivariable constraints such as
Faculty of Degree Engineering
Department of CE/IT (07/
functional dependency and multirelation multivariable constraints such as foreign key
constraints
• Constraints involving aggregates
evaluating the aggregates.
Individual constraints
• Consider relation EMP, horizontally fragmented across three sites using the predicates
and the domain constraint C: ENO < “E4”.
p1 : 0 ENO < “E3”
p2 : ”E3” ENO “E6”
p3 : ENO > “E6”
• Constraint C is compatible with p1 (if C i
not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore,
constraint C should be globally rejected because the tuples at site 3 cannot satisfy C,
and thus relation EMP does not sati
Set-oriented constraints
• Set-oriented constraint are multivariable; that is, they involve join predicates.
• Three cases, given in increasing cost of checking, can occur:
1. The fragmentation of R is derived from that of S based on a semi join on t
used in the assertion join predicate.
2. S is fragmented on join attribute.
3. S is not fragmented on join attribute.
• In the first case, compatibility checking is cheap since the tuple of S matching a tuple
of R is at the same site.
• In the second case, each tuple of R must be compared with at most one fragment of S,
because the join attribute value of the tuple of R can be used to find the site of the
corresponding fragment of S.
• In the third case, each tuple of R must be compared with all f
compatibility is found for all tuples of R, the constraint can be stored at each site.
Constraints involving aggregates
• These constraints are among the most costly to test because they require the calculation
of the aggregate functions
• The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
• Each aggregate function contains a projection part and a selection part.
Q.5 (A) Explain Query Processing in Distributed Systems.
A.5 (A) Query Processing in Distributed Systems
• In a distributed DBMS the catalog has to store additional i
location of relations and their replicas. The catalog must also include system wise
information such as the number of site in the system along with their identifiers'.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
functional dependency and multirelation multivariable constraints such as foreign key
Constraints involving aggregates: require special processing because of the cost of
evaluating the aggregates.
Individual constraints
Consider relation EMP, horizontally fragmented across three sites using the predicates
and the domain constraint C: ENO < “E4”.
p1 : 0 ENO < “E3”
p2 : ”E3” ENO “E6”
Constraint C is compatible with p1 (if C is true, p1 is true) and p2 (if C is true, p2 is
not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore,
constraint C should be globally rejected because the tuples at site 3 cannot satisfy C,
and thus relation EMP does not satisfy C.
oriented constraints
oriented constraint are multivariable; that is, they involve join predicates.
Three cases, given in increasing cost of checking, can occur:
The fragmentation of R is derived from that of S based on a semi join on t
used in the assertion join predicate.
S is fragmented on join attribute.
S is not fragmented on join attribute.
In the first case, compatibility checking is cheap since the tuple of S matching a tuple
of R is at the same site.
cond case, each tuple of R must be compared with at most one fragment of S,
because the join attribute value of the tuple of R can be used to find the site of the
corresponding fragment of S.
In the third case, each tuple of R must be compared with all fragments of S. If
compatibility is found for all tuples of R, the constraint can be stored at each site.
Constraints involving aggregates
These constraints are among the most costly to test because they require the calculation
of the aggregate functions.
The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
Each aggregate function contains a projection part and a selection part.
Query Processing in Distributed Systems.
Query Processing in Distributed Systems
In a distributed DBMS the catalog has to store additional information including the
location of relations and their replicas. The catalog must also include system wise
information such as the number of site in the system along with their identifiers'.
083
functional dependency and multirelation multivariable constraints such as foreign key
: require special processing because of the cost of
Consider relation EMP, horizontally fragmented across three sites using the predicates
s true, p1 is true) and p2 (if C is true, p2 is
not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore,
constraint C should be globally rejected because the tuples at site 3 cannot satisfy C,
oriented constraint are multivariable; that is, they involve join predicates.
The fragmentation of R is derived from that of S based on a semi join on the attribute
In the first case, compatibility checking is cheap since the tuple of S matching a tuple
cond case, each tuple of R must be compared with at most one fragment of S,
because the join attribute value of the tuple of R can be used to find the site of the
ragments of S. If
compatibility is found for all tuples of R, the constraint can be stored at each site.
These constraints are among the most costly to test because they require the calculation
The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT.
Each aggregate function contains a projection part and a selection part.
07
nformation including the
location of relations and their replicas. The catalog must also include system wise
information such as the number of site in the system along with their identifiers'.
Faculty of Degree Engineering
Department of CE/IT (07/
Mapping Global Query to Local
• The tables required in a glob
sites. The local databases have information only about local data. The controlling
site uses the global data dictionary to gather information about the distribution and
reconstructs the global view from t
• If there is no replication, the global optimizer runs local queries at the sites where
the fragments are stored. If there is replication, the global optimizer selects the site
based upon communication cost, workload, and server speed.
• The global optimizer generates a distributed execution plan so that least amount of
data transfer occurs across the sites. The plan states the location of the fragments,
order in which query steps needs to be executed and the processes involved in
transferring intermediate results.
• The local queries are optimized by the local database servers. Finally, the local
query results are merged together through union operation in case of horizontal
fragments and join operation for vertical fragments.
Example
• For example, let us consider that the following Project schema is horizontally
fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad.
• PROJECT
• Suppose there is a query to retrieve details of all projects whose status is
“Ongoing”.
• The global query will be
• Query in New Delhi’s server will be
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Mapping Global Query to Local
The tables required in a global query have fragments distributed across multiple
sites. The local databases have information only about local data. The controlling
site uses the global data dictionary to gather information about the distribution and
reconstructs the global view from the fragments.
If there is no replication, the global optimizer runs local queries at the sites where
the fragments are stored. If there is replication, the global optimizer selects the site
based upon communication cost, workload, and server speed.
lobal optimizer generates a distributed execution plan so that least amount of
data transfer occurs across the sites. The plan states the location of the fragments,
order in which query steps needs to be executed and the processes involved in
intermediate results.
The local queries are optimized by the local database servers. Finally, the local
query results are merged together through union operation in case of horizontal
fragments and join operation for vertical fragments.
mple, let us consider that the following Project schema is horizontally
fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad.
Suppose there is a query to retrieve details of all projects whose status is
global query will be
σstatus="ongoing"(PROJECT)
Query in New Delhi’s server will be
σstatus="ongoing"(NewD−PROJECT)
083
al query have fragments distributed across multiple
sites. The local databases have information only about local data. The controlling
site uses the global data dictionary to gather information about the distribution and
If there is no replication, the global optimizer runs local queries at the sites where
the fragments are stored. If there is replication, the global optimizer selects the site
lobal optimizer generates a distributed execution plan so that least amount of
data transfer occurs across the sites. The plan states the location of the fragments,
order in which query steps needs to be executed and the processes involved in
The local queries are optimized by the local database servers. Finally, the local
query results are merged together through union operation in case of horizontal
mple, let us consider that the following Project schema is horizontally
fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad.
Suppose there is a query to retrieve details of all projects whose status is
Faculty of Degree Engineering
Department of CE/IT (07/
• Query in Kolkata’s server will be
• Query in Hyderabad’s server will be
• In order to get the overall result, we need to union the results of the three queries as
follows
σstatus="ongoing"(NewD
Q (B) Explain Deadlock in Distributed Systems
A (B) Deadlock
• A deadlock can occur because transactions wait for one another. Informally, a
deadlock situation is a set of requests that can never be granted by the concurrency
control mechanism.
• A deadlock can be indicated by a cycle in the
• In computer science,
processes are each waiting for another to release a resource, or more than two
processes are waiting for resources in a circular chain.
Deadlock Detection
• Deadlock detection
identifying the processes and resources involved in the
• Detection of a cycle in WFG proceeds concurrently with normal operation in main
of deadlock detection
Deadlock Prevention
• The deadlock prevention approach does not allow any transaction to acquire locks
that will lead to deadlocks. The convention is that when more than one transactions
request for locking the same data item, only one of them is grante
• One of the most popular deadlock prevention methods is pre
locks.
Deadlock Avoidance
• The deadlock avoidance approach handles deadlocks before they occur. It analyzes
the transactions and the locks to determine whether
deadlock.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Query in Kolkata’s server will be
σstatus="ongoing"(Kol−PROJECT)
Query in Hyderabad’s server will be
σstatus="ongoing"(Hyd−PROJECT)
In order to get the overall result, we need to union the results of the three queries as
NewD−PROJECT)∪σstatus="ongoing"(kol−
us="ongoing"(Hyd−PROJECT)
Deadlock in Distributed Systems.
A deadlock can occur because transactions wait for one another. Informally, a
deadlock situation is a set of requests that can never be granted by the concurrency
control mechanism.
A deadlock can be indicated by a cycle in the wait-for-graph (WFG)
In computer science, deadlock refers to a specific condition when two or more
processes are each waiting for another to release a resource, or more than two
processes are waiting for resources in a circular chain.
Deadlock detection is the process of actually determining that a
identifying the processes and resources involved in the deadlock.
Detection of a cycle in WFG proceeds concurrently with normal operation in main
of deadlock detection.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire locks
that will lead to deadlocks. The convention is that when more than one transactions
request for locking the same data item, only one of them is grante
One of the most popular deadlock prevention methods is pre-acquisition of all the
The deadlock avoidance approach handles deadlocks before they occur. It analyzes
the transactions and the locks to determine whether or not waiting leads to a
083
In order to get the overall result, we need to union the results of the three queries as
kol−PROJECT)∪σstat
07
A deadlock can occur because transactions wait for one another. Informally, a
deadlock situation is a set of requests that can never be granted by the concurrency
graph (WFG).
refers to a specific condition when two or more
processes are each waiting for another to release a resource, or more than two
is the process of actually determining that a deadlock exists and
deadlock.
Detection of a cycle in WFG proceeds concurrently with normal operation in main
The deadlock prevention approach does not allow any transaction to acquire locks
that will lead to deadlocks. The convention is that when more than one transactions
request for locking the same data item, only one of them is granted the lock.
acquisition of all the
The deadlock avoidance approach handles deadlocks before they occur. It analyzes
or not waiting leads to a
Faculty of Degree Engineering
Department of CE/IT (07/
• There are two algorithms for this purpose, namely
• Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows
• Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
younger than T2, T1 is aborted and later restarted.
• Wound-Wait − If T1 is older than T2, T2 is aborted and later restar
if T1 is younger than T2, T1 is allowed to wait.
Q.5 (A) What is authorization control? How do imply authorization control in a distributed
environment?
A.5 (A) Authorization control
• Authorization control consists of checking whether a given triple (subject, operation,
object) can be allowed to proceed.
• The introduction of a subject in the system
password).
• The objects to protect are subsets of the database. Relational systems provide finer and
more general protection granularity than do earlier systems.
• A right expresses a relationship between a subject and an object for a particular set of
operations.
GRANT <operation type(s)> ON <object> TO <subject(s)>
REVOKE <operation type(s)> FROM <object> TO <subject(s)>
Multilevel Access Control
• Discretionary access control has some limitations. One problem is that a malicious user
can access unauthorized data through an authorized user.
• For instance, consider user A who has authorized access to relations R and S and user
B who has authorized acc
application program used by A so it writes R data into S, then B can read unauthorized
data without violating authorization rules.
• Multilevel access control answers this problem and further improve
defining different security levels for both subjects and data objects.
• process has a security level also called clearance derived from that of the user.
• In its simplest form, the security levels are Top Secret (TS), Secret (S),
Confidential (C) and Unclassified (U), and ordered as TS > S >C >U, where
“>” means “more secure”.
• Access in read and write modes by subjects is restricted by two simple rules:
Rule 1 (called “no read up”)
• protects data from unauthorized disclosure, i.e., a subject
can only read objects at the same or lower security levels.
Rule 2 (called “no write down”)
• protects data from unauthorized change, i.e., a subject at a given security level can
only write objects at the same or higher security
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
There are two algorithms for this purpose, namely wait-die and
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
is already locked by T2. The algorithms are as follows
− If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
younger than T2, T1 is aborted and later restarted.
− If T1 is older than T2, T2 is aborted and later restar
if T1 is younger than T2, T1 is allowed to wait.
OR
What is authorization control? How do imply authorization control in a distributed
Authorization control
Authorization control consists of checking whether a given triple (subject, operation,
object) can be allowed to proceed.
The introduction of a subject in the system is typically done by a pair (user name,
The objects to protect are subsets of the database. Relational systems provide finer and
more general protection granularity than do earlier systems.
A right expresses a relationship between a subject and an object for a particular set of
GRANT <operation type(s)> ON <object> TO <subject(s)>
REVOKE <operation type(s)> FROM <object> TO <subject(s)>
Multilevel Access Control
Discretionary access control has some limitations. One problem is that a malicious user
can access unauthorized data through an authorized user.
For instance, consider user A who has authorized access to relations R and S and user
B who has authorized access to relation S only. If B somehow manages to modify an
application program used by A so it writes R data into S, then B can read unauthorized
data without violating authorization rules.
Multilevel access control answers this problem and further improve
defining different security levels for both subjects and data objects.
process has a security level also called clearance derived from that of the user.
In its simplest form, the security levels are Top Secret (TS), Secret (S),
l (C) and Unclassified (U), and ordered as TS > S >C >U, where
“>” means “more secure”.
Access in read and write modes by subjects is restricted by two simple rules:
Rule 1 (called “no read up”)
protects data from unauthorized disclosure, i.e., a subject at a given security level
can only read objects at the same or lower security levels.
Rule 2 (called “no write down”)
protects data from unauthorized change, i.e., a subject at a given security level can
only write objects at the same or higher security levels.
083
and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
is already locked by T2. The algorithms are as follows −
− If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is
− If T1 is older than T2, T2 is aborted and later restarted. Otherwise,
What is authorization control? How do imply authorization control in a distributed 07
Authorization control consists of checking whether a given triple (subject, operation,
is typically done by a pair (user name,
The objects to protect are subsets of the database. Relational systems provide finer and
A right expresses a relationship between a subject and an object for a particular set of
GRANT <operation type(s)> ON <object> TO <subject(s)>
REVOKE <operation type(s)> FROM <object> TO <subject(s)>
Discretionary access control has some limitations. One problem is that a malicious user
For instance, consider user A who has authorized access to relations R and S and user
ess to relation S only. If B somehow manages to modify an
application program used by A so it writes R data into S, then B can read unauthorized
Multilevel access control answers this problem and further improves security by
defining different security levels for both subjects and data objects.
process has a security level also called clearance derived from that of the user.
In its simplest form, the security levels are Top Secret (TS), Secret (S),
l (C) and Unclassified (U), and ordered as TS > S >C >U, where
Access in read and write modes by subjects is restricted by two simple rules:
at a given security level
protects data from unauthorized change, i.e., a subject at a given security level can
Faculty of Degree Engineering
Department of CE/IT (07/
Distributed Access Control
• The additional problems of access control in a distributed environment stem from the
fact that objects and subjects are distributed and that messages with sensitive data can
be read by unauthorized users.
• These problems are: remote user authentication, management of discretionary access
rules, handling of views and of user groups, and enforcing multilevel access control.
• Remote user authentication is necessary since any site of a distributed DBMS may
accept programs initiated, and authorized, at remote sites.
• Three solutions are possible for managing authentication
1. Authentication information is maintained at a central site for global users which can
then be authenticated only once and then
2. The information for authenticating users (user name and password) is replicated at all
sites in the catalog.
3. Intersite communication is thus protected by the use of the site password. Once the
initiating site has been auth
users.
Q (B) Write a short note on Query Opti
A (B) Query optimization refers to the process of producing a query execution plan
represents an execution strategy for the query. This QEP minimizes
A query optimizer, the software module that performs
consisting of three components: a search space,
search space is the set of
plans are equivalent, in the sense that they yield the same result, but they differ in the
execution order of operations and the way these operations are implemented, and therefore in
their performance.
The cost model predicts
must have good knowledge about the distributed execution environment. The
explores the search space and selects the best plan, using the cost model. It defines which
plans are examined and in
distributed) are captured by the search space and the cost model.
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Distributed Access Control
The additional problems of access control in a distributed environment stem from the
fact that objects and subjects are distributed and that messages with sensitive data can
be read by unauthorized users.
ms are: remote user authentication, management of discretionary access
rules, handling of views and of user groups, and enforcing multilevel access control.
Remote user authentication is necessary since any site of a distributed DBMS may
nitiated, and authorized, at remote sites.
Three solutions are possible for managing authentication
Authentication information is maintained at a central site for global users which can
then be authenticated only once and then accessed from multiple sites.
The information for authenticating users (user name and password) is replicated at all
Intersite communication is thus protected by the use of the site password. Once the
initiating site has been authenticated, there is no need for authenticating their remote
Query Optimization.
Query optimization refers to the process of producing a query execution plan
represents an execution strategy for the query. This QEP minimizes an objective cost function.
ery optimizer, the software module that performs query optimization, is usually seen as
consisting of three components: a search space, a cost model, and a search
is the set of alternative execution plans that represent the input q
in the sense that they yield the same result, but they differ in the
of operations and the way these operations are implemented, and therefore in
the cost of a given execution plan. To be accurate, the cost model
knowledge about the distributed execution environment. The
the search space and selects the best plan, using the cost model. It defines which
plans are examined and in which order. The details of the environment (centralized
distributed) are captured by the search space and the cost model.
083
The additional problems of access control in a distributed environment stem from the
fact that objects and subjects are distributed and that messages with sensitive data can
ms are: remote user authentication, management of discretionary access
rules, handling of views and of user groups, and enforcing multilevel access control.
Remote user authentication is necessary since any site of a distributed DBMS may
Authentication information is maintained at a central site for global users which can
accessed from multiple sites.
The information for authenticating users (user name and password) is replicated at all
Intersite communication is thus protected by the use of the site password. Once the
enticated, there is no need for authenticating their remote
07
Query optimization refers to the process of producing a query execution plan (QEP) which
an objective cost function.
query optimization, is usually seen as
a cost model, and a search strategy. The
alternative execution plans that represent the input query. These
in the sense that they yield the same result, but they differ in the
of operations and the way these operations are implemented, and therefore in
n execution plan. To be accurate, the cost model
knowledge about the distributed execution environment. The search strategy
the search space and selects the best plan, using the cost model. It defines which
which order. The details of the environment (centralized versus
Faculty of Degree Engineering
Department of CE/IT (07/
1 Search Space
Query execution plans are typically abstracted by means of operator trees
order in which the operations are executed. They are enriched
such as the best algorithm chosen for each operation.
thus be defined as the set of equivalent
transformation rules. To characterize query
which are operator trees whose
permutations of the join order
queries.
2 Search Strategy
The most popular search strategy used by query optimizers is
deterministic. Deterministic strategies proceed by
joining one more relation at each step until complete plans are
programming builds all possible plans,
reduce the optimization cost, partial plans
pruned (i.e., discarded) as soon as
greedy algorithm, builds
3 Distributed Cost Model
An optimizer’s cost model includes cost functions to predict the cost of operators,
base data, and formulas to evaluate the sizes of intermediate result
execution time, so a cost function represents the execution
Faculty of Degree Engineering - 083
Department of CE/IT (07/16)
Query execution plans are typically abstracted by means of operator trees
order in which the operations are executed. They are enriched with additional information,
such as the best algorithm chosen for each operation. For a given query, the search space can
thus be defined as the set of equivalent operator trees that can be produced using
transformation rules. To characterize query optimizers, it is useful to concentrate on
which are operator trees whose operators are join or Cartesian product. Thi
permutations of the join order have the most important effect on performance of relational
The most popular search strategy used by query optimizers is dynamic programming
Deterministic strategies proceed by building plans, starting
joining one more relation at each step until complete plans are obtained, as in Figure
programming builds all possible plans, breadth first, before it chooses the “best” plan. To
reduce the optimization cost, partial plans that are not likely to lead to the optimal plan are
(i.e., discarded) as soon as possible. By contrast, another deterministic strategy, the
only one plan, depth-first.
3 Distributed Cost Model
An optimizer’s cost model includes cost functions to predict the cost of operators,
base data, and formulas to evaluate the sizes of intermediate results. The cost is in terms of
execution time, so a cost function represents the execution time of a query.
*****Best of Luck*****
083
Query execution plans are typically abstracted by means of operator trees , which define the
with additional information,
For a given query, the search space can
operator trees that can be produced using
optimizers, it is useful to concentrate on join trees,
operators are join or Cartesian product. This is because
have the most important effect on performance of relational
dynamic programming, which is
from base relations,
obtained, as in Figure. Dynamic
before it chooses the “best” plan. To
that are not likely to lead to the optimal plan are
possible. By contrast, another deterministic strategy, the
An optimizer’s cost model includes cost functions to predict the cost of operators, statistics and
The cost is in terms of
time of a query.

More Related Content

What's hot

Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
Ali Usman
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
Ali Usman
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
Ali Usman
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
 

What's hot (20)

Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Query processing in Distributed Database System
Query processing in Distributed Database SystemQuery processing in Distributed Database System
Query processing in Distributed Database System
 
Replication Techniques for Distributed Database Design
Replication Techniques for Distributed Database DesignReplication Techniques for Distributed Database Design
Replication Techniques for Distributed Database Design
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Distributed concurrency control
Distributed concurrency controlDistributed concurrency control
Distributed concurrency control
 
Object oriented database concepts
Object oriented database conceptsObject oriented database concepts
Object oriented database concepts
 
Object database standards, languages and design
Object database standards, languages and designObject database standards, languages and design
Object database standards, languages and design
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed database
 
Fragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseFragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed Database
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Temporal database
Temporal databaseTemporal database
Temporal database
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 

Viewers also liked

Lecture 08 distributed dbms
Lecture 08 distributed dbmsLecture 08 distributed dbms
Lecture 08 distributed dbms
emailharmeet
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
emailharmeet
 
Finsight - Private Equity in India: Recent observations and emerging trends a...
Finsight - Private Equity in India: Recent observations and emerging trends a...Finsight - Private Equity in India: Recent observations and emerging trends a...
Finsight - Private Equity in India: Recent observations and emerging trends a...
South Asia Fast Track
 
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeA
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeActividad7 softwareeducativo vivas_aguilarligiaguadalupeA
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeA
Eric Leonardo Aguilar Mendoza
 
Ordenado power point
Ordenado power pointOrdenado power point
Ordenado power point
Crx Irre
 
Grade 8 Parent Night - 2013
Grade 8 Parent Night - 2013Grade 8 Parent Night - 2013
Grade 8 Parent Night - 2013
derekoei
 

Viewers also liked (20)

Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
 
DDBMS
DDBMSDDBMS
DDBMS
 
Lecture 1 ddbms
Lecture 1 ddbmsLecture 1 ddbms
Lecture 1 ddbms
 
Lecture 08 distributed dbms
Lecture 08 distributed dbmsLecture 08 distributed dbms
Lecture 08 distributed dbms
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Friendly cities - 2
Friendly cities - 2Friendly cities - 2
Friendly cities - 2
 
Finsight - Private Equity in India: Recent observations and emerging trends a...
Finsight - Private Equity in India: Recent observations and emerging trends a...Finsight - Private Equity in India: Recent observations and emerging trends a...
Finsight - Private Equity in India: Recent observations and emerging trends a...
 
Burgos
BurgosBurgos
Burgos
 
The Evolution Of Web Design
The Evolution Of Web DesignThe Evolution Of Web Design
The Evolution Of Web Design
 
A New Leadership Imperative: A Social CEO
A New Leadership Imperative: A Social CEOA New Leadership Imperative: A Social CEO
A New Leadership Imperative: A Social CEO
 
DerTour - Belgua-wale Hautnah
DerTour - Belgua-wale HautnahDerTour - Belgua-wale Hautnah
DerTour - Belgua-wale Hautnah
 
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
 
Internet segura bueno
Internet segura bueno Internet segura bueno
Internet segura bueno
 
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeA
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeActividad7 softwareeducativo vivas_aguilarligiaguadalupeA
ctividad7 softwareeducativo vivas_aguilarligiaguadalupeA
 
Ordenado power point
Ordenado power pointOrdenado power point
Ordenado power point
 
Slovak Consular Services-Newcastle Upon Tyne 11 June 2014
Slovak Consular Services-Newcastle Upon Tyne 11 June 2014Slovak Consular Services-Newcastle Upon Tyne 11 June 2014
Slovak Consular Services-Newcastle Upon Tyne 11 June 2014
 
Grade 8 Parent Night - 2013
Grade 8 Parent Night - 2013Grade 8 Parent Night - 2013
Grade 8 Parent Night - 2013
 
Murli cement
Murli cementMurli cement
Murli cement
 
Horario tercer semestre alumnos
Horario tercer semestre  alumnos Horario tercer semestre  alumnos
Horario tercer semestre alumnos
 

Similar to DDBMS Paper with Solution

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Journal of Software Engineering and Applications, 2014, 7, 891.docx
Journal of Software Engineering and Applications, 2014, 7, 891.docxJournal of Software Engineering and Applications, 2014, 7, 891.docx
Journal of Software Engineering and Applications, 2014, 7, 891.docx
LaticiaGrissomzz
 
Chapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).pptChapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).ppt
latigudata
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdf
neju3
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
flyinimohamed
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
IDES Editor
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
Manikandan Selvaganesh
 
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
IJET - International Journal of Engineering and Techniques
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
Editor Jacotech
 

Similar to DDBMS Paper with Solution (20)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Journal of Software Engineering and Applications, 2014, 7, 891.docx
Journal of Software Engineering and Applications, 2014, 7, 891.docxJournal of Software Engineering and Applications, 2014, 7, 891.docx
Journal of Software Engineering and Applications, 2014, 7, 891.docx
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understanding
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)
 
Chapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).pptChapter-6 Distribute Database system (3).ppt
Chapter-6 Distribute Database system (3).ppt
 
A Review On Fragmentation Techniques In Distributed Database
A Review On Fragmentation Techniques In Distributed DatabaseA Review On Fragmentation Techniques In Distributed Database
A Review On Fragmentation Techniques In Distributed Database
 
Advance DBMS
Advance DBMSAdvance DBMS
Advance DBMS
 
Chapter25
Chapter25Chapter25
Chapter25
 
DDBS PPT (1).pptx
DDBS PPT (1).pptxDDBS PPT (1).pptx
DDBS PPT (1).pptx
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdf
 
02 Distributed DBMSTechnology
02 Distributed DBMSTechnology02 Distributed DBMSTechnology
02 Distributed DBMSTechnology
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
 
Adbms 23 distributed database design
Adbms 23 distributed database designAdbms 23 distributed database design
Adbms 23 distributed database design
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
 
Query optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementQuery optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query management
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
 
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 

More from Gyanmanjari Institute Of Technology

More from Gyanmanjari Institute Of Technology (20)

WD - Unit - 7 - Advanced Concepts
WD - Unit - 7 - Advanced ConceptsWD - Unit - 7 - Advanced Concepts
WD - Unit - 7 - Advanced Concepts
 
WD - Unit - 4 - PHP Basics
WD - Unit - 4 - PHP BasicsWD - Unit - 4 - PHP Basics
WD - Unit - 4 - PHP Basics
 
WD - Unit - 3 - Java Script
WD - Unit - 3 - Java ScriptWD - Unit - 3 - Java Script
WD - Unit - 3 - Java Script
 
WD - Unit - 6 - Database Connectivity using PHP
WD - Unit - 6 - Database Connectivity using PHPWD - Unit - 6 - Database Connectivity using PHP
WD - Unit - 6 - Database Connectivity using PHP
 
WD - Unit - 5 - Session and State Management using PHP
WD - Unit - 5 - Session and State Management using PHPWD - Unit - 5 - Session and State Management using PHP
WD - Unit - 5 - Session and State Management using PHP
 
WD - Unit - 2 - HTML & CSS
WD - Unit - 2 - HTML & CSSWD - Unit - 2 - HTML & CSS
WD - Unit - 2 - HTML & CSS
 
WD - Unit - 1 - Introduction
WD - Unit - 1 - IntroductionWD - Unit - 1 - Introduction
WD - Unit - 1 - Introduction
 
OSV - Unit - 8 - Unix/Linux Operating System
OSV - Unit - 8 - Unix/Linux Operating SystemOSV - Unit - 8 - Unix/Linux Operating System
OSV - Unit - 8 - Unix/Linux Operating System
 
OSV - Unit - 10 - Approaches to Virtualization
OSV - Unit - 10 - Approaches to VirtualizationOSV - Unit - 10 - Approaches to Virtualization
OSV - Unit - 10 - Approaches to Virtualization
 
OSV - Unit - 9 - Virtualization Concepts
OSV - Unit - 9 - Virtualization ConceptsOSV - Unit - 9 - Virtualization Concepts
OSV - Unit - 9 - Virtualization Concepts
 
OSV - Unit - 7 - I/O Management & Disk scheduling
OSV - Unit - 7 - I/O Management & Disk schedulingOSV - Unit - 7 - I/O Management & Disk scheduling
OSV - Unit - 7 - I/O Management & Disk scheduling
 
OSV - Unit - 6 - Memory Management
OSV - Unit - 6 - Memory ManagementOSV - Unit - 6 - Memory Management
OSV - Unit - 6 - Memory Management
 
CNS - Unit - 10 - Web Security Threats and Approaches
CNS - Unit - 10 - Web Security Threats and ApproachesCNS - Unit - 10 - Web Security Threats and Approaches
CNS - Unit - 10 - Web Security Threats and Approaches
 
OSV - Unit - 5 - Deadlock
OSV - Unit - 5 - DeadlockOSV - Unit - 5 - Deadlock
OSV - Unit - 5 - Deadlock
 
OSV - Unit - 4 - Inter Process Communication
OSV - Unit - 4 - Inter Process CommunicationOSV - Unit - 4 - Inter Process Communication
OSV - Unit - 4 - Inter Process Communication
 
OSV - Unit - 3 - Concurrency
OSV - Unit - 3 - ConcurrencyOSV - Unit - 3 - Concurrency
OSV - Unit - 3 - Concurrency
 
OSV - Unit - 2 - Process and Threads Management
OSV - Unit - 2 - Process and Threads ManagementOSV - Unit - 2 - Process and Threads Management
OSV - Unit - 2 - Process and Threads Management
 
CNS - Unit - 8 - Key Management and Distribution
CNS - Unit - 8 - Key Management and DistributionCNS - Unit - 8 - Key Management and Distribution
CNS - Unit - 8 - Key Management and Distribution
 
CNS - Unit - 7 - Digital Signature
CNS - Unit - 7 - Digital SignatureCNS - Unit - 7 - Digital Signature
CNS - Unit - 7 - Digital Signature
 
CNS - Unit - 6 - Message Authentication Code
CNS - Unit - 6 - Message Authentication CodeCNS - Unit - 6 - Message Authentication Code
CNS - Unit - 6 - Message Authentication Code
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 

DDBMS Paper with Solution

  • 1. Faculty of Degree Engineering Department of CE/IT (07/ Enrollment no: _____________________ B.E. – SEMESTER – Subject Code: 2170714 Subject Name: Distributed DBMS Duration: 2:30 hr. Instruction: 1. Attempt all questions. 2. Make suitable assumption where necessary. 3. Figure to the right indicate full Marks. Q. 1 (A) Explain the potential problems with DDBMS A:1 (A) 1.6.1 Distributed Database D The question that is being addressed is how the database and the applications that run against it should be placed across the sites. There are two basic alternatives to placing data: partitioned (or non- database is divided into a number of disjoint partitions each of which is placed at a different site. Replicated designs can be either duplicated) where the entire database is stored at each site, or partially duplicated) where each partition of the database is stored at more than one site, but not at all the sites. The two fundamental design issues are the separation of the database into partitions called Optimum distribution of fragments. The research in this area mostly involves mathematical programming in order to minimize the combined cost of storing the database, processing transactions Against it, and message communication among site Therefore, the proposed solutions are based on heuristics. 1.6.2 Distributed Directory Management A directory contains information (such as descriptions and locations) about data items in the database. Problems related t to the database placement problem discussed in the preceding section. A directory may be global to the entire DDBS or local to each site; it can be centralized at one site or distributed over several sites; ther 1.6.3 Distributed Query Processing Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. The problem is how to decide on a strategy for executing each query over the network in the most cost way, however cost is defined. The factors to be considered are the distribution of data, communication costs, and lack of sufficient locally objective is to optimize where the inherent parallelism is used to improve the performance of executing the transaction, subject to the above problem is NP-hard in nature, and the approaches are usually heuristic. 1.6.4 Distributed Concurrency Con Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. It is, without any doubt, one of the most extensively studied problems in the DDBS field. The concurrency control problem in a distributed context is somewhat different than in a centralized framework. One not only has to worry about the integrity of a single database, but Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Enrollment no: _____________________ Seat No: _____________________ VII PRE-FINAL EXAMINATION – Date: 17 Time: Total Marks: Make suitable assumption where necessary. Figure to the right indicate full Marks. Explain the potential problems with DDBMS. 1.6.1 Distributed Database Design The question that is being addressed is how the database and the applications that run against it should be placed across the sites. There are two basic alternatives to placing -replicated) and replicated. In the partitioned scheme the database is divided into a number of disjoint partitions each of which is placed at a different site. Replicated designs can be either fully replicated (also called ) where the entire database is stored at each site, or partially replicated ) where each partition of the database is stored at more than one site, but not at all the sites. The two fundamental design issues are fragmentation the separation of the database into partitions called fragments, and distribution Optimum distribution of fragments. The research in this area mostly involves mathematical programming in order to minimize the combined cost of storing the database, processing transactions Against it, and message communication among sites. The general problem is NP Therefore, the proposed solutions are based on heuristics. 1.6.2 Distributed Directory Management A directory contains information (such as descriptions and locations) about data items in the database. Problems related to directory management are similar in nature to the database placement problem discussed in the preceding section. A directory may be global to the entire DDBS or local to each site; it can be centralized at one site or distributed over several sites; there can be a single copy or multiple copies. 1.6.3 Distributed Query Processing Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. The problem is how to decide for executing each query over the network in the most cost- way, however cost is defined. The factors to be considered are the distribution of data, communication costs, and lack of sufficient locally-available information. The imize where the inherent parallelism is used to improve the performance of executing the transaction, subject to the above-mentioned constraints. The hard in nature, and the approaches are usually heuristic. 1.6.4 Distributed Concurrency Control Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. It is, without any doubt, one of the most extensively studied problems in the DDBS field. The concurrency control problem in a distributed context is somewhat different than in a centralized framework. One not only has to worry about the integrity of a single database, but 083 Seat No: _____________________ – 2016 (ODD) 17-10-2016 10:00 AM To 12:30 PM Total Marks: 70 07 The question that is being addressed is how the database and the applications that run against it should be placed across the sites. There are two basic alternatives to placing scheme the database is divided into a number of disjoint partitions each of which is placed at (also called fully replicated (or ) where each partition of the database is stored at more than one fragmentation, distribution, the The research in this area mostly involves mathematical programming in order to minimize the combined cost of storing the database, processing transactions s. The general problem is NP-hard. A directory contains information (such as descriptions and locations) about data o directory management are similar in nature to the database placement problem discussed in the preceding section. A directory may be global to the entire DDBS or local to each site; it can be centralized at one e can be a single copy or multiple copies. Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. The problem is how to decide -effective way, however cost is defined. The factors to be considered are the distribution of available information. The imize where the inherent parallelism is used to improve the performance mentioned constraints. The Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. It is, without any doubt, one of the most extensively studied problems in the DDBS field. The concurrency control problem in a distributed context is somewhat different than in a centralized framework. One not only has to worry about the integrity of a single database, but
  • 2. Faculty of Degree Engineering Department of CE/IT (07/ also about the consistency of multiple copies of the database. The condition that requires all the values of multiple copies of every data item to converge to the same value is called mutual consistency 1.6.5 Distributed Deadlock Management The deadlock problem in DDBSs is similar in nature to that encountered in operating systems. The competition among users for access to a set of resources (data, in this case) can result in a deadlock if the synchronization mechanism is based on locking. The well-known alternatives of prevention, avoidance, and detection/recovery also apply to DDBSs. 1.6.6 Reliability of Distributed DBMS We mentioned earlier that one of the potential advantages of distributed systems is improved reliability and availability. This, however, is not a feature that comes automatically. It is important that mechanisms be provid of the database as well as to detect failures and recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable or inaccessible, the databases at the operational sites rem Furthermore, when the computer system or network recovers from the failure, the DDBSs should be able to recover and bring the databases at the failed sites up This may be especially difficult in the case of network p are divided into two or more groups with no communication among them. 1.6.7 Replication If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the consistency of the have the same value. These protocols can be to be applied to all the replicas before the transaction completes, or they may be lazy so that the transaction updates one copy (c are propagated to the others after the transaction completes. 1.6.8 Relationship among Problem The design of distributed databases affects many areas. It affects directory management, because the definition of fragments and their placement determine the contents of the directory (or directories) as well as the strategies that may be employed to manage them. The same information (i.e., fragment structure and placement) is used by the query processor to determine the quer and usage patterns that are determined by the query processor are used as inputs to the data distribution and fragmentation algorithms. Similarly, directory placement and contents influence the processin There is a strong relationship among the concurrency control problem, the deadlock management problem, and reliability issues. This is to be expected, since together they are usually called the control algorithm that is employed will determine whether or not a separate deadlock management facility is required. If a locking occur, whereas they will not if time stamping is the chosen alternative. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) also about the consistency of multiple copies of the database. The condition that es all the values of multiple copies of every data item to converge to the same mutual consistency. 1.6.5 Distributed Deadlock Management The deadlock problem in DDBSs is similar in nature to that encountered in operating etition among users for access to a set of resources (data, in this case) can result in a deadlock if the synchronization mechanism is based on locking. known alternatives of prevention, avoidance, and detection/recovery also 6 Reliability of Distributed DBMS We mentioned earlier that one of the potential advantages of distributed systems is improved reliability and availability. This, however, is not a feature that comes automatically. It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable or inaccessible, the databases at the operational sites remain consistent and up to date. Furthermore, when the computer system or network recovers from the failure, the DDBSs should be able to recover and bring the databases at the failed sites up This may be especially difficult in the case of network partitioning, where the sites are divided into two or more groups with no communication among them. If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the consistency of the replicas,i.e., copies of the same data item have the same value. These protocols can be eager in that they force the updates to be applied to all the replicas before the transaction completes, or they may be so that the transaction updates one copy (called the master) from which updates are propagated to the others after the transaction completes. 1.6.8 Relationship among Problem distributed databases affects many areas. It affects directory management, because the s and their placement determine the contents of the directory (or directories) as well as the strategies that may be employed to manage them. The same information (i.e., fragment structure and placement) is used by the query processor to determine the query evaluation strategy. On the other hand, the access and usage patterns that are determined by the query processor are used as inputs to the data distribution and fragmentation algorithms. Similarly, directory placement and contents influence the processing of queries. There is a strong relationship among the concurrency control problem, the deadlock management problem, and reliability issues. This is to be expected, since together they are usually called the transaction management problem. The concurrency control algorithm that is employed will determine whether or not a separate deadlock management facility is required. If a locking-based algorithm is used, deadlocks will occur, whereas they will not if time stamping is the chosen alternative. 083 also about the consistency of multiple copies of the database. The condition that es all the values of multiple copies of every data item to converge to the same The deadlock problem in DDBSs is similar in nature to that encountered in operating etition among users for access to a set of resources (data, in this case) can result in a deadlock if the synchronization mechanism is based on locking. known alternatives of prevention, avoidance, and detection/recovery also We mentioned earlier that one of the potential advantages of distributed systems is improved reliability and availability. This, however, is not a feature that comes ed to ensure the consistency of the database as well as to detect failures and recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable ain consistent and up to date. Furthermore, when the computer system or network recovers from the failure, the DDBSs should be able to recover and bring the databases at the failed sites up-to-date. artitioning, where the sites are divided into two or more groups with no communication among them. If the distributed database is (partially or fully) replicated, it is necessary to implement replicas,i.e., copies of the same data item in that they force the updates to be applied to all the replicas before the transaction completes, or they may be ) from which updates distributed databases affects many areas. It affects directory management, because the s and their placement determine the contents of the directory (or directories) as well as the strategies that may be employed to manage them. The same information (i.e., fragment structure and placement) is used by the query y evaluation strategy. On the other hand, the access and usage patterns that are determined by the query processor are used as inputs to the data distribution and fragmentation algorithms. Similarly, directory placement There is a strong relationship among the concurrency control problem, the deadlock management problem, and reliability issues. This is to be expected, since together problem. The concurrency control algorithm that is employed will determine whether or not a separate deadlock based algorithm is used, deadlocks will
  • 3. Faculty of Degree Engineering Department of CE/IT (07/ Q (B) Explain RAID Level Recovery Technique. A (B) RAID or Redundant A secondary storage devices and use them as a single storage media.RAID consists of an array of disks in which multiple disks goals. RAID levels define the use of disk arrays. RAID 0 In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receive to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Explain RAID Level Recovery Technique. Array of Independent Disks, is a technology to connect multiple secondary storage devices and use them as a single storage media.RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID levels define the use of disk arrays. In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receive to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0. 083 07 isks, is a technology to connect multiple secondary storage devices and use them as a single storage media.RAID consists of an are connected together to achieve different In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receives a block of data to write/read in parallel. It enhances the speed and performance of the storage device.
  • 4. Faculty of Degree Engineering Department of CE/IT (07/ RAID 1 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of data to all the disks in the array. RAID level 1 is also called provides 100% redundancy in case of a failure. RAID 2 • RAID Level 2 uses concept of parallel access technique. It works on the word(byte) level. So each strip stores one bit. It tak extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. • In RAID level 2, strip are very small, so when a block is read, all disk accessed in parallel. • Hamming code generation is time consuming, therefore RAID level 2 is too slow for most commercial application. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a to all the disks in the array. RAID level 1 is also called provides 100% redundancy in case of a failure. RAID Level 2 uses concept of parallel access technique. It works on the word(byte) level. So each strip stores one bit. It takes data striping to the extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. In RAID level 2, strip are very small, so when a block is read, all disk accessed in parallel. Hamming code generation is time consuming, therefore RAID level 2 is too slow for most commercial application. 083 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a to all the disks in the array. RAID level 1 is also called mirroring and RAID Level 2 uses concept of parallel access technique. It works on the es data striping to the extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. In RAID level 2, strip are very small, so when a block is read, all disks are Hamming code generation is time consuming, therefore RAID level 2 is too
  • 5. Faculty of Degree Engineering Department of CE/IT (07/ RAID 3 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a different disk. This technique makes it to overcome single disk failures. RAID 4 In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte whereas level 4 uses block disks to implement RAID. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is t disk. This technique makes it to overcome single disk failures. In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte s level 4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID. 083 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is t disk. This technique makes it to overcome single disk failures. In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte-level striping, level striping. Both level 3 and level 4 require at least three
  • 6. Faculty of Degree Engineering Department of CE/IT (07/ RAID 5 RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block stripe are distributed among all the different dedicated disk. RAID 6 RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This level requires at least four disk drives to implement RAID. Q. 2 (A) What is Concurrency? List out method of Concurrency control and Explain any one of them. A. 2 (A) Concurrency: In computer science, instruction sequences at the same time. • In distributed database system, database is typically used by many users. These system usually allow multiple transaction to run concurrently at the same time. • It must support parallel execution of tran • Communication delay is less. • It must be recovery from site and communication failure. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block stripe are distributed among all the data disks rather than storing them on a different dedicated disk. RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional lt tolerance. This level requires at least four disk drives to implement RAID. What is Concurrency? List out method of Concurrency control and Explain any one of In computer science, concurrency is the execution of several instruction sequences at the same time. In distributed database system, database is typically used by many users. These system usually allow multiple transaction to run concurrently at the same time. It must support parallel execution of transaction. Communication delay is less. It must be recovery from site and communication failure. 083 RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data disks rather than storing them on a RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional lt tolerance. This level requires at least four disk drives to implement RAID. What is Concurrency? List out method of Concurrency control and Explain any one of 07 n of several In distributed database system, database is typically used by many users. These system usually allow multiple transaction to run concurrently at the same time.
  • 7. Faculty of Degree Engineering Department of CE/IT (07/ Locking-Based Concurrency Control • The main idea of locking item that is shared by conflicting operations is accessed by one operation at a time. • This lock is set by a transaction before it is accessed and is reset at the end of its use. • There are two types of locks read lock (rl) and write lock (wl) Locking-Based Concurrency Control Algorithms Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Methods of concurrency control Based Concurrency Control The main idea of locking-based concurrency control is to ensure that a dat item that is shared by conflicting operations is accessed by one operation at a This lock is set by a transaction before it is accessed and is reset at the end of There are two types of locks read lock (rl) and write lock (wl) ed Concurrency Control Algorithms 083 based concurrency control is to ensure that a data item that is shared by conflicting operations is accessed by one operation at a This lock is set by a transaction before it is accessed and is reset at the end of There are two types of locks read lock (rl) and write lock (wl)
  • 8. Faculty of Degree Engineering Department of CE/IT (07/ 2PL Lock Graph Q (B) Explain layers of query processing. A (B) Query Decomposition • The first layer decomposes the calculus query into an algebraic query on global relations. The information needed for this conceptual schema describing the global relations. • Query decomposition can be viewed as four successive steps. • First, the calculus query is rewritten in a normalized form that is suitable for subsequent manipulation. manipulation of the query quantifiers and of the query qualification by applying logical operator priority. • Second, the normalized query is analyzed semantically so that incorrect queries are detected and rej queries exist only for a subset of relational calculus. Typically, they use some sort of graph that captures the semantics of the query. • Third, the correct query (still expressed in relational cal One way to simplify a query is to eliminate redundant predicates. Note that redundant queries are likely to arise when a query is the result of system transformations applied to the user query. such transformations are used for performing semantic data control (views, protection, and semantic integrity control). • Fourth, the calculus query is restructured as an algebraic query. The traditional way to do this transformation toward a “better” algebraic specification is to start with an initial algebraic query and transform it in order to find a “go • The algebraic query generated by this layer is good in the sense that the worse executions are typically avoided. Data Localization • The input to the second layer is an algebraic query on globa main role of the second layer is to localize the query’s data using data distribution information in the fragment schema. • This layer determines which fragments are involved in the query and transforms the distributed query into a query on • A global relation can be reconstructed by applying the fragmentation rules, and Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Explain layers of query processing. Query Decomposition The first layer decomposes the calculus query into an algebraic query on global relations. The information needed for this transformation is found in the global conceptual schema describing the global relations. Query decomposition can be viewed as four successive steps. , the calculus query is rewritten in a normalized form that is suitable for subsequent manipulation. Normalization of a query generally involves the manipulation of the query quantifiers and of the query qualification by applying logical operator priority. , the normalized query is analyzed semantically so that incorrect queries are detected and rejected as early as possible. Techniques to detect incorrect queries exist only for a subset of relational calculus. Typically, they use some sort of graph that captures the semantics of the query. , the correct query (still expressed in relational calculus) is simplified. One way to simplify a query is to eliminate redundant predicates. Note that redundant queries are likely to arise when a query is the result of system transformations applied to the user query. such transformations are used for rming semantic data control (views, protection, and semantic integrity , the calculus query is restructured as an algebraic query. The traditional way to do this transformation toward a “better” algebraic specification is to nitial algebraic query and transform it in order to find a “go The algebraic query generated by this layer is good in the sense that the worse executions are typically avoided. The input to the second layer is an algebraic query on globa main role of the second layer is to localize the query’s data using data distribution information in the fragment schema. This layer determines which fragments are involved in the query and transforms the distributed query into a query on fragments. A global relation can be reconstructed by applying the fragmentation rules, and 083 07 The first layer decomposes the calculus query into an algebraic query on global transformation is found in the global Query decomposition can be viewed as four successive steps. , the calculus query is rewritten in a normalized form that is suitable for Normalization of a query generally involves the manipulation of the query quantifiers and of the query qualification by applying , the normalized query is analyzed semantically so that incorrect queries ected as early as possible. Techniques to detect incorrect queries exist only for a subset of relational calculus. Typically, they use some culus) is simplified. One way to simplify a query is to eliminate redundant predicates. Note that redundant queries are likely to arise when a query is the result of system transformations applied to the user query. such transformations are used for rming semantic data control (views, protection, and semantic integrity , the calculus query is restructured as an algebraic query. The traditional way to do this transformation toward a “better” algebraic specification is to nitial algebraic query and transform it in order to find a “go The algebraic query generated by this layer is good in the sense that the worse The input to the second layer is an algebraic query on global relations. The main role of the second layer is to localize the query’s data using data This layer determines which fragments are involved in the query and A global relation can be reconstructed by applying the fragmentation rules, and
  • 9. Faculty of Degree Engineering Department of CE/IT (07/ then deriving a program, called a localization program, of relational algebra operators, which then act on fragments. • Generating a query on fragments is done in two • First, the query is mapped into a fragment query by substituting each relation by its reconstruction program (also called materialization program). • Second another “good” query. Global Query Optimization • The input to the third layer is an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query which is close to optimal. • The previous layers have already optimized the quer eliminating redundant expressions. However, this optimization is independent of fragment characteristics such as fragment allocation and cardinalities. • Query optimization consists of finding the “best” ordering of operators in the query, including communication operators that minimize a cost function. • The output of the query optimization layer is a optimized algebraic query with communication operators included on fragments. It is typically represented and saved (for future executions) as Faculty of Degree Engineering - 083 Department of CE/IT (07/16) then deriving a program, called a localization program, of relational algebra operators, which then act on fragments. Generating a query on fragments is done in two steps , the query is mapped into a fragment query by substituting each relation by its reconstruction program (also called materialization program). Second, the fragment query is simplified and restructured to produce another “good” query. Global Query Optimization The input to the third layer is an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query which is close The previous layers have already optimized the query, for example, by eliminating redundant expressions. However, this optimization is independent of fragment characteristics such as fragment allocation and cardinalities. Query optimization consists of finding the “best” ordering of operators in including communication operators that minimize a cost The output of the query optimization layer is a optimized algebraic query with communication operators included on fragments. It is typically represented and saved (for future executions) as a distributed query execution 083 then deriving a program, called a localization program, of relational algebra , the query is mapped into a fragment query by substituting each relation by its reconstruction program (also called materialization , the fragment query is simplified and restructured to produce The input to the third layer is an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query which is close y, for example, by eliminating redundant expressions. However, this optimization is independent of fragment characteristics such as fragment allocation and cardinalities. Query optimization consists of finding the “best” ordering of operators in including communication operators that minimize a cost The output of the query optimization layer is a optimized algebraic query with communication operators included on fragments. It is typically represented and a distributed query execution plan.
  • 10. Faculty of Degree Engineering Department of CE/IT (07/ Q (B) Explain RAID Level Recovery Technique. A (B) RAID or Redundant A secondary storage devices and use them as a single storage media.RAID consists of an array of disks in which multiple disks goals. RAID levels define the use of disk arrays. RAID 0 In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receive to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Explain RAID Level Recovery Technique. Array of Independent Disks, is a technology to connect multiple secondary storage devices and use them as a single storage media.RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID levels define the use of disk arrays. In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receive to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0. 083 07 isks, is a technology to connect multiple secondary storage devices and use them as a single storage media.RAID consists of an are connected together to achieve different In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receives a block of data to write/read in parallel. It enhances the speed and performance of the storage device.
  • 11. Faculty of Degree Engineering Department of CE/IT (07/ Minterm Properties • Minterm selectivity • Number of records that satisfy minterm • sel(m1) = 1; sel(m2) = 1; sel(m4) = 0 • Access frequency by applications and users • Q = {q1, q2, …, qq} is set of queries • acc(q1) is freque Algorithm for Determining Minterms Rule 1: fragment is partitioned into at least two parts that by at least one application Definitions R - relation Pr - set of simple predicates Pr' - another set of simple predicates F - set of minterm fragments COM_MIN (R, Pr) { // Compute minterms Rule 1 Pr' = pi Pr = Pr - pi F = fi // fi is minterm fragment according to p while (Pr' is incomplete) { find pj Pr that partitions some f Pr' = Pr' U pj Pr = Pr - pj F = F U fj // fj is minterm fragment according to p if pk Pr' which is Pr' = Pr' - pk F = F - fk } } return Pr' } Q.3 (A) Explain following in context of Relational algebra : 1. Selection 2. Natural Join 3. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Minterm selectivity Number of records that satisfy minterm sel(m1) = 1; sel(m2) = 1; sel(m4) = 0 Access frequency by applications and users Q = {q1, q2, …, qq} is set of queries acc(q1) is frequency of access of query 1 Algorithm for Determining Minterms Rule 1: fragment is partitioned into at least two parts that are accessed differently by at least one application set of simple predicates simple predicates set of minterm fragments COM_MIN (R, Pr) { // Compute minterms // find a pi such that pi partitions R according to is minterm fragment according to pi while (Pr' is incomplete) Pr that partitions some fk of Pr' is minterm fragment according to pj Pr' which is non-relevant { // this is complex following in context of Relational algebra : 1. Selection 2. Natural Join 3. Projection 083 are accessed differently partitions R according to 07
  • 12. Faculty of Degree Engineering Department of CE/IT (07/ A.3 (A) 1. Selection • Produces a horizontal subset of the operand relation • General form σF(R)={ where • R is a relation, • F is a formula consisting of • • • Selection Example 2. Natural Join • Equi-join of two re both R and S and projecting out one copy of those attributes • R ⋈ S = ΠR∪Sσ Natural Join Example Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Produces a horizontal subset of the operand relation )={t |t∈R and F(t) is true} is a relation, t is a tuple variable is a formula consisting of operands that are constants or attributes arithmetic comparison operators <, >, =, ≠, ≤, ≥ logical operators ∧, ∨, ¬ join of two relations R and S over an attribute (or attributes) common to and projecting out one copy of those attributes σF(R × S) Natural Join Example 083 over an attribute (or attributes) common to and projecting out one copy of those attributes
  • 13. Faculty of Degree Engineering Department of CE/IT (07/ 3. Projection • Produces a vertical slice of a relation • General form where • R is a relation, • {A1,…, will be performed • Note: projection can generate duplicate tuples. Commercial systems (and SQL) allow this and provide • Projection with duplicate elimination • Projection without duplicate elimination Projection Example Q (B) Explain Client Server architecture for Distributed DBMS. A (B) • This provides two- complexity of modern DBMSs and the complexity of distribution. • The server does most of the data management work (query processing an optimization, transaction management, storage management). • The client is the application and the user interface (management the data that is cached to the client, management the transaction locks). • Multiple client - single server From a data management perspective, this is not much different from centralized databases since the database is stor Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Produces a vertical slice of a relation ΠA1,…,An(R)={t[A1,…, An] | t∈R} is a relation, t is a tuple variable ,…, An} is a subset of the attributes of R over which the will be performed Note: projection can generate duplicate tuples. Commercial systems (and SQL) allow this and provide Projection with duplicate elimination Projection without duplicate elimination erver architecture for Distributed DBMS. -level architecture which make it easier to manage the complexity of modern DBMSs and the complexity of distribution. does most of the data management work (query processing an optimization, transaction management, storage management). is the application and the user interface (management the data that is cached to the client, management the transaction locks). • This architecture is quite common in relational systems where the communication between the clients and the server(s) is at the level of SQL statements. single server From a data management perspective, this is not much different from centralized databases since the database is stored on only one machine (the server) 083 over which the projection Note: projection can generate duplicate tuples. Commercial systems (and SQL) 07 which make it easier to manage the complexity of modern DBMSs and the complexity of distribution. does most of the data management work (query processing and is the application and the user interface (management the data that is This architecture is quite common in ems where the communication between the clients and the server(s) is at the level of SQL statements. From a data management perspective, this is not much different from ed on only one machine (the server)
  • 14. Faculty of Degree Engineering Department of CE/IT (07/ which also hosts the software to manage it. However, there are some differences from centralized systems in the way transactions are executed and caches are managed. • Multiple client - multiple server In this case, two client manages its own connection to the appropriate server or each client knows of only its “home server” which then communicates with other servers as required. Q. 3 (A) Explain ACID property in concept of DDBMS. A. 3 (A) • The consistency and reliability aspects of transactions are due to four properties 1. Atomicity 2. Consistency 3. Isolation 4. Durability • Together, these are commonly referred to as the ACID properties of transactions. 1. Atomicity • Atomicity refers to the fact that a transaction is treated as a unit of operation. Therefore, either all the transaction’s actions are completed, or none of them are. This is also known as the “all 2. Consistency • The consistency of a t • In other words, a transaction is a correct program that maps one consistent database state to another. 3. Isolation • Isolation is the property of transactions that requires each transaction to see a consistent database at • In other words, an executing transaction cannot reveal its results to other concurrent transactions before its commitment. 4. Durability • Durability refers to that property of transactions which ensures that once a transaction commits, its result database. Q (B) Explain AA matrix, CA matrix and BEA algorithm in context of vertical fragmentation. A (B) Determining Affinity • The attribute use matrix does not help us yet • We cannot determine the affin • because we don't know the access frequency of the • attribute groups • We need this to calculate attribute affinity • Defines how often Ai and Aj are accessed together • Depends on the frequency of query requests for simultaneously. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) which also hosts the software to manage it. However, there are some differences from centralized systems in the way transactions are executed and caches are managed. multiple server In this case, two alternative management strategies are possible: either each client manages its own connection to the appropriate server or each client knows of only its “home server” which then communicates with other servers as required. OR rty in concept of DDBMS. The consistency and reliability aspects of transactions are due to four properties Together, these are commonly referred to as the ACID properties of transactions. Atomicity refers to the fact that a transaction is treated as a unit of operation. Therefore, either all the transaction’s actions are completed, or none of them are. This is also known as the “all-or-nothing property.” The consistency of a transaction is simply its correctness. In other words, a transaction is a correct program that maps one consistent database state to another. Isolation is the property of transactions that requires each transaction to see a consistent database at all times. In other words, an executing transaction cannot reveal its results to other concurrent transactions before its commitment. Durability refers to that property of transactions which ensures that once a transaction commits, its results are permanent and cannot be erased from the Explain AA matrix, CA matrix and BEA algorithm in context of vertical Determining Affinity The attribute use matrix does not help us yet We cannot determine the affinity of the attributes because we don't know the access frequency of the attribute groups ed this to calculate attribute affinity - aff(Ai, Aj) Defines how often Ai and Aj are accessed together Depends on the frequency of query requests for attributes Ai and Aj simultaneously. 083 which also hosts the software to manage it. However, there are some differences from centralized systems in the way transactions are executed and caches are managed. alternative management strategies are possible: either each client manages its own connection to the appropriate server or each client knows of only its “home server” which then communicates with other servers as required. 07 The consistency and reliability aspects of transactions are due to four properties Together, these are commonly referred to as the ACID properties of transactions. Atomicity refers to the fact that a transaction is treated as a unit of operation. Therefore, either all the transaction’s actions are completed, or none of them In other words, a transaction is a correct program that maps one consistent Isolation is the property of transactions that requires each transaction to see a In other words, an executing transaction cannot reveal its results to other Durability refers to that property of transactions which ensures that once a s are permanent and cannot be erased from the Explain AA matrix, CA matrix and BEA algorithm in context of vertical 07 Ai and Aj
  • 15. Faculty of Degree Engineering Department of CE/IT (07/ Affinity Measure • The attribute affinity between two attributes Ai and Aj of a relation • where refm(qk) is the number of access to attributes Ai and Aj for each application qk at site Sm • where accl(qk) is the access frequency of Attribute Affinity Matrix Clustering Algorithm • Want to find which attributes belong together in a vertically fragmented table • Examining it for this small case is sufficient Bond Energy Algorithm (BEA) • Initialize - pick • Iterate Select the next column and try to place it in the matrix Choose the place that maximizes the global affinity Repeat • Repeat the process for rows. However, since AA is symmetric, jus We will reorder the columns later to create a symmetric matrix Faculty of Degree Engineering - 083 Department of CE/IT (07/16) The attribute affinity between two attributes Ai and Aj of a relation where refm(qk) is the number of access to attributes Ai and Aj for each application qk at site Sm where accl(qk) is the access frequency of qk for each site Attribute Affinity Matrix – Example Clustering Algorithm Want to find which attributes belong together in a vertically fragmented table Examining it for this small case is sufficient Bond Energy Algorithm (BEA) pick any column Select the next column and try to place it in the matrix Choose the place that maximizes the global affinity Repeat the process for rows. However, since AA is symmetric, just reorder the rows! We will reorder the columns later to create a symmetric matrix 083 The attribute affinity between two attributes Ai and Aj of a relation where refm(qk) is the number of access to attributes Ai and Aj for each Want to find which attributes belong together in a vertically fragmented table We will reorder the columns later to create a symmetric matrix
  • 16. Faculty of Degree Engineering Department of CE/IT (07/ BEA Pseudocode Bond Energy Algorithm (BEA) { input: AA // attribute affinity matrix output: CA // clustered attribute matrix // put the first two columns in CA(*, 1) <- AA(*, 1) CA(*, 2) <- AA(*, 2) // for each of the rest of the columns of AA, // choose the best placement. while (index <= n ) { // calculate continuity of each possible place for new column for (i=1; i< index; i++) // iterate over the columns calculate cont(Ai-1, Aindex, Ai); calculate cont(Aindex- loc <- placement given by maximum cont() for (j=index; j>loc; j-- CA(*,j) <- CA (*, j-1); CA(*, loc) = AA(*, index); index <- index + 1; } // reorder the rows according to the placement of columns } Q.4 (A) Explain Two phase commit protocol. A:4 (A) Two-phase commit (2PC) is a very simple and elegant protocol that ensures the atomic commitment of distributed transactions. It extends t distributed transactions by insisting that all sites involved in the execution of a distributed transaction agree to commit the transaction before its effects are made permanent. There are a number of reasons wh the type of concurrency control algorithm that is used, some schedulers may not be ready to terminate a transaction. For example, if a transaction has read a value of a data item that is updated by another transaction that has not yet committed, the associated scheduler may not want to commit the former. Of course, strict concurrency control algorithms that avoid cascading aborts would not permit the updated value of a data item to be read transaction until the updating transaction terminates. This is sometimes called the recoverability condition. A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the coordinator writes a begin com sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if it could commit the transaction. If so, the participant writes a “vote-commit” message to the coordinator, and enters READY state; otherwise, the participant writes an abort record and sends a “vote the site is to abort, it can forget about that transaction, si (i.e., unilateral abort). After the coordinator has received a reply from every participant, it decides whether to commit or to abort the transaction. If even one participant has registered a negative vote, the coordi sends a “global-abort” message to all participant sites, and enters the ABORT state; otherwise, it writes a commit record, sends a “global COMMIT state. The participants either commit or abort the transaction according to the coordinator’s instructions and send back an acknowledgment, at which point the coordinator terminates the transaction by writing an Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Bond Energy Algorithm (BEA) { input: AA // attribute affinity matrix output: CA // clustered attribute matrix // put the first two columns in // for each of the rest of the columns of AA, // choose the best placement. // calculate continuity of each possible place for new column for (i=1; i< index; i++) // iterate over the columns 1, Aindex, Ai); -1, Aindex, Aindex+1); // boundary placement given by maximum cont() --) // iterate over the columns 1); CA(*, loc) = AA(*, index); // reorder the rows according to the placement of columns Explain Two phase commit protocol. phase commit (2PC) is a very simple and elegant protocol that ensures the atomic commitment of distributed transactions. It extends the effects of local atomic commit actions to distributed transactions by insisting that all sites involved in the execution of a distributed transaction agree to commit the transaction before its effects are made permanent. There are a number of reasons why such synchronization among sites is necessary. First, depending on the type of concurrency control algorithm that is used, some schedulers may not be ready to terminate a transaction. For example, if a transaction has read a value of a data item that is updated by another transaction that has not yet committed, the associated scheduler may not want to commit the former. Of course, strict concurrency control algorithms that avoid cascading aborts would not permit the updated value of a data item to be read transaction until the updating transaction terminates. This is sometimes called the A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the begin commit record in its log, sends a “prepare” message to all participant sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if it could commit the transaction. If so, the participant writes a ready record in the log, se commit” message to the coordinator, and enters READY state; otherwise, the participant record and sends a “vote-abort” message to the coordinator. If the decision of the site is to abort, it can forget about that transaction, since an abort decision serves as a veto (i.e., unilateral abort). After the coordinator has received a reply from every participant, it decides whether to commit or to abort the transaction. If even one participant has registered a negative vote, the coordinator has to abort the transaction globally. So it writes an abort” message to all participant sites, and enters the ABORT state; otherwise, record, sends a “global-commit” message to all participants, and e COMMIT state. The participants either commit or abort the transaction according to the coordinator’s instructions and send back an acknowledgment, at which point the coordinator terminates the transaction by writing an end of transaction record in the log. 083 07 phase commit (2PC) is a very simple and elegant protocol that ensures the atomic he effects of local atomic commit actions to distributed transactions by insisting that all sites involved in the execution of a distributed transaction agree to commit the transaction before its effects are made permanent. There are y such synchronization among sites is necessary. First, depending on the type of concurrency control algorithm that is used, some schedulers may not be ready to terminate a transaction. For example, if a transaction has read a value of a data item that is updated by another transaction that has not yet committed, the associated scheduler may not want to commit the former. Of course, strict concurrency control algorithms that avoid cascading aborts would not permit the updated value of a data item to be read by any other transaction until the updating transaction terminates. This is sometimes called the A brief description of the 2PC protocol that does not consider failures is as follows. Initially, the record in its log, sends a “prepare” message to all participant sites, and enters the WAIT state. When a participant receives a “prepare” message, it checks if record in the log, sends a commit” message to the coordinator, and enters READY state; otherwise, the participant abort” message to the coordinator. If the decision of nce an abort decision serves as a veto (i.e., unilateral abort). After the coordinator has received a reply from every participant, it decides whether to commit or to abort the transaction. If even one participant has registered a nator has to abort the transaction globally. So it writes an abort record, abort” message to all participant sites, and enters the ABORT state; otherwise, commit” message to all participants, and enters the COMMIT state. The participants either commit or abort the transaction according to the coordinator’s instructions and send back an acknowledgment, at which point the coordinator
  • 17. Faculty of Degree Engineering Department of CE/IT (07/ Faculty of Degree Engineering - 083 Department of CE/IT (07/16) State Transitions in 2PC Protocol 083
  • 18. Faculty of Degree Engineering Department of CE/IT (07/ Q (B) Explain Top-down and Bottom A (B) Top-Down Design Process Conceptual design of the data is the ER model of the whole enterprise Federated by the views Must anticipate new views/usages Must describe semantics of the data as used in the domain/enterprise This is almost identical to typical DB design However, we are concerned with Distribution Design We need to place tables "geographic We also need to fragment tables Bottom Up • Top-down design is the choice when you have the liberty of starting from scratch Unfortunately, this is not usually the case Some element of bottom • Bottom-up design is integrating independent/semi Global Conceptual Must deal with schema mapping issues May deal with heterogeneous integration issues Faculty of Degree Engineering - 083 Department of CE/IT (07/16) down and Bottom-up design strategies. Down Design Process Conceptual design of the data is the ER model of the whole enterprise Federated by the views Must anticipate new views/usages Must describe semantics of the data as used in the domain/enterprise This is almost identical to typical DB design However, we are concerned with Distribution Design We need to place tables "geographically" on the network We also need to fragment tables down design is the choice when you have the liberty of starting from scratch Unfortunately, this is not usually the case Some element of bottom-up design is more common up design is integrating independent/semi-independent schemas into a Global Conceptual Schema (GCS) Must deal with schema mapping issues May deal with heterogeneous integration issues OR 083 07 Conceptual design of the data is the ER model of the whole enterprise Must describe semantics of the data as used in the domain/enterprise down design is the choice when you have the liberty of starting from scratch independent schemas into a
  • 19. Faculty of Degree Engineering Department of CE/IT (07/ RAID 1 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of data to all the disks in the array. RAID level 1 is also called provides 100% redundancy in case of a failure. RAID 2 • RAID Level 2 uses concept of parallel access technique. It works on the word(byte) level. So each strip stores one bit. It tak extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. • In RAID level 2, strip are very small, so when a block is read, all disk accessed in parallel. • Hamming code generation is time consuming, therefore RAID level 2 is too slow for most commercial application. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a to all the disks in the array. RAID level 1 is also called provides 100% redundancy in case of a failure. RAID Level 2 uses concept of parallel access technique. It works on the word(byte) level. So each strip stores one bit. It takes data striping to the extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. In RAID level 2, strip are very small, so when a block is read, all disk accessed in parallel. Hamming code generation is time consuming, therefore RAID level 2 is too slow for most commercial application. 083 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a to all the disks in the array. RAID level 1 is also called mirroring and RAID Level 2 uses concept of parallel access technique. It works on the es data striping to the extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. In RAID level 2, strip are very small, so when a block is read, all disks are Hamming code generation is time consuming, therefore RAID level 2 is too
  • 20. Faculty of Degree Engineering Department of CE/IT (07/ (1) The Operation-based Appr • Principle: – Record all changes made to state of process (‘audit trail’ or ‘log’) such that process can be returned to a previous state – Example: A transaction based environment where transactions update a database • • (2) State-based Approach • Principle: establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire state of process • Actions: – ‘Checkpointing’ or ‘taking a checkpoint’: saving process state – ‘Rolling back’ a process: restoring a process to a prior state Log Based Recovery • When failures occur the following operation that use the log are executed. • UNDO: restore database to stat • REDO: perform the changes to the database over again. UNDO REDO Faculty of Degree Engineering - 083 Department of CE/IT (07/16) based Approach Record all changes made to state of process (‘audit trail’ or ‘log’) such that process can be returned to a previous state Example: A transaction based environment where transactions update a database It is possible to commit or undo updates on a per basis A commit indicates that the transaction on the object was successful and changes are permanent based Approach establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire state of process ‘Checkpointing’ or ‘taking a checkpoint’: saving process state ‘Rolling back’ a process: restoring a process to a prior state When failures occur the following operation that use the log are executed. UNDO: restore database to state prior to execution. REDO: perform the changes to the database over again. 083 Record all changes made to state of process (‘audit trail’ or ‘log’) such Example: A transaction based environment where transactions update a es on a per-transaction A commit indicates that the transaction on the object was establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire ‘Checkpointing’ or ‘taking a checkpoint’: saving process state ‘Rolling back’ a process: restoring a process to a prior state When failures occur the following operation that use the log are executed. REDO: perform the changes to the database over again.
  • 21. Faculty of Degree Engineering Department of CE/IT (07/ Write-Ahead Logging Protocol • write-ahead logging durability in database systems. • In a system using W applied. Usually both redo and undo information is stored in the log. • The purpose of this can be illustrated by an example. Imagine a program that is in the middle of performing some operation when power. Upon restart, that program might well need to know whether the operation it was performing succeeded, half used, the program can check this log and compare what it was doing when it unexpectedly lost power to what was actually done. On the basis of this comparison, the program could decide to undo what it had started, complete what it had started, or keep things as they are. Q (B) What is Semantic Integrity control? Explain in concept of Centralized and Distributed environment. A (B) Semantic Integrity Control • Semantic integrity control ensures database consistency by rejecting update transactions that lead to inconsistent database states, or b the database state, which compensate for the effects of the update transactions. • Two main types of integrity constraints can be distinguished: structural constraints and behavioral constraints. • Structural constraints expre of such constraints are unique key constraints in the relational model, or one associations between objects in the object • Behavioral constraints are essential in the databas associations between objects, such as inclusion dependency in the relational model, or describe object properties and structures. Centralized Semantic Integrity Control • Specification of Integrity Constraints • triggers (event-condition Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Ahead Logging Protocol ahead logging (WAL) is a family of techniques for providing atomicity and durability in database systems. In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. The purpose of this can be illustrated by an example. Imagine a program that is in the middle of performing some operation when the machine it is running on loses power. Upon restart, that program might well need to know whether the operation it was performing succeeded, half-succeeded, or failed. If a write used, the program can check this log and compare what it was doing when it unexpectedly lost power to what was actually done. On the basis of this comparison, the program could decide to undo what it had started, complete what it had started, or keep things as they are. ntegrity control? Explain in concept of Centralized and Distributed Semantic Integrity Control Semantic integrity control ensures database consistency by rejecting update transactions that lead to inconsistent database states, or by activating specific actions on the database state, which compensate for the effects of the update transactions. Two main types of integrity constraints can be distinguished: structural constraints and behavioral constraints. Structural constraints express basic semantic properties inherent to a model. Examples of such constraints are unique key constraints in the relational model, or one associations between objects in the object-oriented model. Behavioral constraints are essential in the database design process. They can express associations between objects, such as inclusion dependency in the relational model, or describe object properties and structures. Centralized Semantic Integrity Control Specification of Integrity Constraints condition-action rules) can be used to automatically propagate updates, 083 ) is a family of techniques for providing atomicity and AL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. The purpose of this can be illustrated by an example. Imagine a program that is in the machine it is running on loses power. Upon restart, that program might well need to know whether the operation succeeded, or failed. If a write-ahead log is supposed to be doing when it unexpectedly lost power to what was actually done. On the basis of this comparison, the program could decide to undo what it had started, complete ntegrity control? Explain in concept of Centralized and Distributed 07 Semantic integrity control ensures database consistency by rejecting update y activating specific actions on the database state, which compensate for the effects of the update transactions. Two main types of integrity constraints can be distinguished: structural constraints and ss basic semantic properties inherent to a model. Examples of such constraints are unique key constraints in the relational model, or one-to-many e design process. They can express associations between objects, such as inclusion dependency in the relational model, or action rules) can be used to automatically propagate updates,
  • 22. Faculty of Degree Engineering Department of CE/IT (07/ and thus to maintain semantic integrity. • We can distinguish between three types of integrity constraints: precondition, or general constraints • EMP(ENO, ENAME, TITLE) • PROJ(PNO, PNAME, BUDGET) • ASG(ENO, PNO, RESP, DUR) • Predefined constraints express concisely the more common constraints of the relational model, such as non null attribute, unique key, foreign • Employee number in relation EMP cannot be null. ENO NOT NULL IN EMP • The project number PNO in relation ASG is a foreign key matching the primary key PNO of relation PROJ. PNO IN ASG REFERENCES PNO IN PROJ • Precondition const relation for a given update type. The update type, which might be INSERT, DELETE, or MODIFY, permits restricting the integrity control. • Precondition constraints can be expressed with the with the ability to specify the update type. CHECK ON <relation name > WHEN <update type> • The budget of a project is between 500K and 1000K. CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 10 • Only the tuples whose budget is 0 may be deleted. CHECK ON PROJ WHEN DELETE (BUDGET = 0) • General constraints quantified. The database system must ensure that those formulas are alway CHECK ON list of <variable name>:<relation name>,(<qualification>) • The total duration for all employees in the CAD project is less than 100. • CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF j.PNAME="CAD/CAM") Distributed Semantic Integr • Definition of Distributed Integrity Constraints • Assertions can involve data stored at different sites, the storage of the constraints must be decided so as to minimize the cost of integrity checking. There is a strategy based on a taxonomy of integrity constraints that distinguishes three classes: • Individual constraints tuples to be updated independently of the rest of the database. • Set-oriented constraints Faculty of Degree Engineering - 083 Department of CE/IT (07/16) and thus to maintain semantic integrity. We can distinguish between three types of integrity constraints: predefined general constraints. EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET) ASG(ENO, PNO, RESP, DUR) Predefined constraints are based on simple keywords. Through them, it is possible to express concisely the more common constraints of the relational model, such as non null attribute, unique key, foreign key, or functional dependency. Employee number in relation EMP cannot be null. ENO NOT NULL IN EMP The project number PNO in relation ASG is a foreign key matching the primary key PNO of relation PROJ. PNO IN ASG REFERENCES PNO IN PROJ Precondition constraints express conditions that must be satisfied by all tuples in a relation for a given update type. The update type, which might be INSERT, DELETE, or MODIFY, permits restricting the integrity control. Precondition constraints can be expressed with the SQL CHECK statement enriched with the ability to specify the update type. CHECK ON <relation name > WHEN <update type> (<qualification over relation name>) The budget of a project is between 500K and 1000K. CHECK ON PROJ (BUDGET+ >= 500000 AND BUDGET <= 1000000) Only the tuples whose budget is 0 may be deleted. CHECK ON PROJ WHEN DELETE (BUDGET = 0) General constraints are formulas of tuple relational calculus where all variables are quantified. The database system must ensure that those formulas are alway CHECK ON list of <variable name>:<relation name>,(<qualification>) The total duration for all employees in the CAD project is less than 100. CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF j.PNAME="CAD/CAM") Distributed Semantic Integrity Control Definition of Distributed Integrity Constraints Assertions can involve data stored at different sites, the storage of the constraints must be decided so as to minimize the cost of integrity checking. There is a strategy based integrity constraints that distinguishes three classes: Individual constraints: single-relation single-variable constraints. They refer only to tuples to be updated independently of the rest of the database. oriented constraints: include single-relation multivariable constraints such as 083 predefined, are based on simple keywords. Through them, it is possible to express concisely the more common constraints of the relational model, such as non- The project number PNO in relation ASG is a foreign key matching the primary key express conditions that must be satisfied by all tuples in a relation for a given update type. The update type, which might be INSERT, DELETE, SQL CHECK statement enriched CHECK ON <relation name > WHEN <update type> 00000) are formulas of tuple relational calculus where all variables are quantified. The database system must ensure that those formulas are always true. CHECK ON list of <variable name>:<relation name>,(<qualification>) The total duration for all employees in the CAD project is less than 100. CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO)<100 IF Assertions can involve data stored at different sites, the storage of the constraints must be decided so as to minimize the cost of integrity checking. There is a strategy based integrity constraints that distinguishes three classes: variable constraints. They refer only to n multivariable constraints such as
  • 23. Faculty of Degree Engineering Department of CE/IT (07/ functional dependency and multirelation multivariable constraints such as foreign key constraints • Constraints involving aggregates evaluating the aggregates. Individual constraints • Consider relation EMP, horizontally fragmented across three sites using the predicates and the domain constraint C: ENO < “E4”. p1 : 0 ENO < “E3” p2 : ”E3” ENO “E6” p3 : ENO > “E6” • Constraint C is compatible with p1 (if C i not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore, constraint C should be globally rejected because the tuples at site 3 cannot satisfy C, and thus relation EMP does not sati Set-oriented constraints • Set-oriented constraint are multivariable; that is, they involve join predicates. • Three cases, given in increasing cost of checking, can occur: 1. The fragmentation of R is derived from that of S based on a semi join on t used in the assertion join predicate. 2. S is fragmented on join attribute. 3. S is not fragmented on join attribute. • In the first case, compatibility checking is cheap since the tuple of S matching a tuple of R is at the same site. • In the second case, each tuple of R must be compared with at most one fragment of S, because the join attribute value of the tuple of R can be used to find the site of the corresponding fragment of S. • In the third case, each tuple of R must be compared with all f compatibility is found for all tuples of R, the constraint can be stored at each site. Constraints involving aggregates • These constraints are among the most costly to test because they require the calculation of the aggregate functions • The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT. • Each aggregate function contains a projection part and a selection part. Q.5 (A) Explain Query Processing in Distributed Systems. A.5 (A) Query Processing in Distributed Systems • In a distributed DBMS the catalog has to store additional i location of relations and their replicas. The catalog must also include system wise information such as the number of site in the system along with their identifiers'. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) functional dependency and multirelation multivariable constraints such as foreign key Constraints involving aggregates: require special processing because of the cost of evaluating the aggregates. Individual constraints Consider relation EMP, horizontally fragmented across three sites using the predicates and the domain constraint C: ENO < “E4”. p1 : 0 ENO < “E3” p2 : ”E3” ENO “E6” Constraint C is compatible with p1 (if C is true, p1 is true) and p2 (if C is true, p2 is not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore, constraint C should be globally rejected because the tuples at site 3 cannot satisfy C, and thus relation EMP does not satisfy C. oriented constraints oriented constraint are multivariable; that is, they involve join predicates. Three cases, given in increasing cost of checking, can occur: The fragmentation of R is derived from that of S based on a semi join on t used in the assertion join predicate. S is fragmented on join attribute. S is not fragmented on join attribute. In the first case, compatibility checking is cheap since the tuple of S matching a tuple of R is at the same site. cond case, each tuple of R must be compared with at most one fragment of S, because the join attribute value of the tuple of R can be used to find the site of the corresponding fragment of S. In the third case, each tuple of R must be compared with all fragments of S. If compatibility is found for all tuples of R, the constraint can be stored at each site. Constraints involving aggregates These constraints are among the most costly to test because they require the calculation of the aggregate functions. The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT. Each aggregate function contains a projection part and a selection part. Query Processing in Distributed Systems. Query Processing in Distributed Systems In a distributed DBMS the catalog has to store additional information including the location of relations and their replicas. The catalog must also include system wise information such as the number of site in the system along with their identifiers'. 083 functional dependency and multirelation multivariable constraints such as foreign key : require special processing because of the cost of Consider relation EMP, horizontally fragmented across three sites using the predicates s true, p1 is true) and p2 (if C is true, p2 is not necessarily false), but not with p3 (if C is true, then p3 is false). Therefore, constraint C should be globally rejected because the tuples at site 3 cannot satisfy C, oriented constraint are multivariable; that is, they involve join predicates. The fragmentation of R is derived from that of S based on a semi join on the attribute In the first case, compatibility checking is cheap since the tuple of S matching a tuple cond case, each tuple of R must be compared with at most one fragment of S, because the join attribute value of the tuple of R can be used to find the site of the ragments of S. If compatibility is found for all tuples of R, the constraint can be stored at each site. These constraints are among the most costly to test because they require the calculation The aggregate functions generally manipulated are MIN, MAX, SUM, and COUNT. Each aggregate function contains a projection part and a selection part. 07 nformation including the location of relations and their replicas. The catalog must also include system wise information such as the number of site in the system along with their identifiers'.
  • 24. Faculty of Degree Engineering Department of CE/IT (07/ Mapping Global Query to Local • The tables required in a glob sites. The local databases have information only about local data. The controlling site uses the global data dictionary to gather information about the distribution and reconstructs the global view from t • If there is no replication, the global optimizer runs local queries at the sites where the fragments are stored. If there is replication, the global optimizer selects the site based upon communication cost, workload, and server speed. • The global optimizer generates a distributed execution plan so that least amount of data transfer occurs across the sites. The plan states the location of the fragments, order in which query steps needs to be executed and the processes involved in transferring intermediate results. • The local queries are optimized by the local database servers. Finally, the local query results are merged together through union operation in case of horizontal fragments and join operation for vertical fragments. Example • For example, let us consider that the following Project schema is horizontally fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad. • PROJECT • Suppose there is a query to retrieve details of all projects whose status is “Ongoing”. • The global query will be • Query in New Delhi’s server will be Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Mapping Global Query to Local The tables required in a global query have fragments distributed across multiple sites. The local databases have information only about local data. The controlling site uses the global data dictionary to gather information about the distribution and reconstructs the global view from the fragments. If there is no replication, the global optimizer runs local queries at the sites where the fragments are stored. If there is replication, the global optimizer selects the site based upon communication cost, workload, and server speed. lobal optimizer generates a distributed execution plan so that least amount of data transfer occurs across the sites. The plan states the location of the fragments, order in which query steps needs to be executed and the processes involved in intermediate results. The local queries are optimized by the local database servers. Finally, the local query results are merged together through union operation in case of horizontal fragments and join operation for vertical fragments. mple, let us consider that the following Project schema is horizontally fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad. Suppose there is a query to retrieve details of all projects whose status is global query will be σstatus="ongoing"(PROJECT) Query in New Delhi’s server will be σstatus="ongoing"(NewD−PROJECT) 083 al query have fragments distributed across multiple sites. The local databases have information only about local data. The controlling site uses the global data dictionary to gather information about the distribution and If there is no replication, the global optimizer runs local queries at the sites where the fragments are stored. If there is replication, the global optimizer selects the site lobal optimizer generates a distributed execution plan so that least amount of data transfer occurs across the sites. The plan states the location of the fragments, order in which query steps needs to be executed and the processes involved in The local queries are optimized by the local database servers. Finally, the local query results are merged together through union operation in case of horizontal mple, let us consider that the following Project schema is horizontally fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad. Suppose there is a query to retrieve details of all projects whose status is
  • 25. Faculty of Degree Engineering Department of CE/IT (07/ • Query in Kolkata’s server will be • Query in Hyderabad’s server will be • In order to get the overall result, we need to union the results of the three queries as follows σstatus="ongoing"(NewD Q (B) Explain Deadlock in Distributed Systems A (B) Deadlock • A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by the concurrency control mechanism. • A deadlock can be indicated by a cycle in the • In computer science, processes are each waiting for another to release a resource, or more than two processes are waiting for resources in a circular chain. Deadlock Detection • Deadlock detection identifying the processes and resources involved in the • Detection of a cycle in WFG proceeds concurrently with normal operation in main of deadlock detection Deadlock Prevention • The deadlock prevention approach does not allow any transaction to acquire locks that will lead to deadlocks. The convention is that when more than one transactions request for locking the same data item, only one of them is grante • One of the most popular deadlock prevention methods is pre locks. Deadlock Avoidance • The deadlock avoidance approach handles deadlocks before they occur. It analyzes the transactions and the locks to determine whether deadlock. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Query in Kolkata’s server will be σstatus="ongoing"(Kol−PROJECT) Query in Hyderabad’s server will be σstatus="ongoing"(Hyd−PROJECT) In order to get the overall result, we need to union the results of the three queries as NewD−PROJECT)∪σstatus="ongoing"(kol− us="ongoing"(Hyd−PROJECT) Deadlock in Distributed Systems. A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by the concurrency control mechanism. A deadlock can be indicated by a cycle in the wait-for-graph (WFG) In computer science, deadlock refers to a specific condition when two or more processes are each waiting for another to release a resource, or more than two processes are waiting for resources in a circular chain. Deadlock detection is the process of actually determining that a identifying the processes and resources involved in the deadlock. Detection of a cycle in WFG proceeds concurrently with normal operation in main of deadlock detection. Deadlock Prevention The deadlock prevention approach does not allow any transaction to acquire locks that will lead to deadlocks. The convention is that when more than one transactions request for locking the same data item, only one of them is grante One of the most popular deadlock prevention methods is pre-acquisition of all the The deadlock avoidance approach handles deadlocks before they occur. It analyzes the transactions and the locks to determine whether or not waiting leads to a 083 In order to get the overall result, we need to union the results of the three queries as kol−PROJECT)∪σstat 07 A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by the concurrency graph (WFG). refers to a specific condition when two or more processes are each waiting for another to release a resource, or more than two is the process of actually determining that a deadlock exists and deadlock. Detection of a cycle in WFG proceeds concurrently with normal operation in main The deadlock prevention approach does not allow any transaction to acquire locks that will lead to deadlocks. The convention is that when more than one transactions request for locking the same data item, only one of them is granted the lock. acquisition of all the The deadlock avoidance approach handles deadlocks before they occur. It analyzes or not waiting leads to a
  • 26. Faculty of Degree Engineering Department of CE/IT (07/ • There are two algorithms for this purpose, namely • Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already locked by T2. The algorithms are as follows • Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2, T1 is aborted and later restarted. • Wound-Wait − If T1 is older than T2, T2 is aborted and later restar if T1 is younger than T2, T1 is allowed to wait. Q.5 (A) What is authorization control? How do imply authorization control in a distributed environment? A.5 (A) Authorization control • Authorization control consists of checking whether a given triple (subject, operation, object) can be allowed to proceed. • The introduction of a subject in the system password). • The objects to protect are subsets of the database. Relational systems provide finer and more general protection granularity than do earlier systems. • A right expresses a relationship between a subject and an object for a particular set of operations. GRANT <operation type(s)> ON <object> TO <subject(s)> REVOKE <operation type(s)> FROM <object> TO <subject(s)> Multilevel Access Control • Discretionary access control has some limitations. One problem is that a malicious user can access unauthorized data through an authorized user. • For instance, consider user A who has authorized access to relations R and S and user B who has authorized acc application program used by A so it writes R data into S, then B can read unauthorized data without violating authorization rules. • Multilevel access control answers this problem and further improve defining different security levels for both subjects and data objects. • process has a security level also called clearance derived from that of the user. • In its simplest form, the security levels are Top Secret (TS), Secret (S), Confidential (C) and Unclassified (U), and ordered as TS > S >C >U, where “>” means “more secure”. • Access in read and write modes by subjects is restricted by two simple rules: Rule 1 (called “no read up”) • protects data from unauthorized disclosure, i.e., a subject can only read objects at the same or lower security levels. Rule 2 (called “no write down”) • protects data from unauthorized change, i.e., a subject at a given security level can only write objects at the same or higher security Faculty of Degree Engineering - 083 Department of CE/IT (07/16) There are two algorithms for this purpose, namely wait-die and Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a is already locked by T2. The algorithms are as follows − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2, T1 is aborted and later restarted. − If T1 is older than T2, T2 is aborted and later restar if T1 is younger than T2, T1 is allowed to wait. OR What is authorization control? How do imply authorization control in a distributed Authorization control Authorization control consists of checking whether a given triple (subject, operation, object) can be allowed to proceed. The introduction of a subject in the system is typically done by a pair (user name, The objects to protect are subsets of the database. Relational systems provide finer and more general protection granularity than do earlier systems. A right expresses a relationship between a subject and an object for a particular set of GRANT <operation type(s)> ON <object> TO <subject(s)> REVOKE <operation type(s)> FROM <object> TO <subject(s)> Multilevel Access Control Discretionary access control has some limitations. One problem is that a malicious user can access unauthorized data through an authorized user. For instance, consider user A who has authorized access to relations R and S and user B who has authorized access to relation S only. If B somehow manages to modify an application program used by A so it writes R data into S, then B can read unauthorized data without violating authorization rules. Multilevel access control answers this problem and further improve defining different security levels for both subjects and data objects. process has a security level also called clearance derived from that of the user. In its simplest form, the security levels are Top Secret (TS), Secret (S), l (C) and Unclassified (U), and ordered as TS > S >C >U, where “>” means “more secure”. Access in read and write modes by subjects is restricted by two simple rules: Rule 1 (called “no read up”) protects data from unauthorized disclosure, i.e., a subject at a given security level can only read objects at the same or lower security levels. Rule 2 (called “no write down”) protects data from unauthorized change, i.e., a subject at a given security level can only write objects at the same or higher security levels. 083 and wound-wait. Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a is already locked by T2. The algorithms are as follows − − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, What is authorization control? How do imply authorization control in a distributed 07 Authorization control consists of checking whether a given triple (subject, operation, is typically done by a pair (user name, The objects to protect are subsets of the database. Relational systems provide finer and A right expresses a relationship between a subject and an object for a particular set of GRANT <operation type(s)> ON <object> TO <subject(s)> REVOKE <operation type(s)> FROM <object> TO <subject(s)> Discretionary access control has some limitations. One problem is that a malicious user For instance, consider user A who has authorized access to relations R and S and user ess to relation S only. If B somehow manages to modify an application program used by A so it writes R data into S, then B can read unauthorized Multilevel access control answers this problem and further improves security by defining different security levels for both subjects and data objects. process has a security level also called clearance derived from that of the user. In its simplest form, the security levels are Top Secret (TS), Secret (S), l (C) and Unclassified (U), and ordered as TS > S >C >U, where Access in read and write modes by subjects is restricted by two simple rules: at a given security level protects data from unauthorized change, i.e., a subject at a given security level can
  • 27. Faculty of Degree Engineering Department of CE/IT (07/ Distributed Access Control • The additional problems of access control in a distributed environment stem from the fact that objects and subjects are distributed and that messages with sensitive data can be read by unauthorized users. • These problems are: remote user authentication, management of discretionary access rules, handling of views and of user groups, and enforcing multilevel access control. • Remote user authentication is necessary since any site of a distributed DBMS may accept programs initiated, and authorized, at remote sites. • Three solutions are possible for managing authentication 1. Authentication information is maintained at a central site for global users which can then be authenticated only once and then 2. The information for authenticating users (user name and password) is replicated at all sites in the catalog. 3. Intersite communication is thus protected by the use of the site password. Once the initiating site has been auth users. Q (B) Write a short note on Query Opti A (B) Query optimization refers to the process of producing a query execution plan represents an execution strategy for the query. This QEP minimizes A query optimizer, the software module that performs consisting of three components: a search space, search space is the set of plans are equivalent, in the sense that they yield the same result, but they differ in the execution order of operations and the way these operations are implemented, and therefore in their performance. The cost model predicts must have good knowledge about the distributed execution environment. The explores the search space and selects the best plan, using the cost model. It defines which plans are examined and in distributed) are captured by the search space and the cost model. Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Distributed Access Control The additional problems of access control in a distributed environment stem from the fact that objects and subjects are distributed and that messages with sensitive data can be read by unauthorized users. ms are: remote user authentication, management of discretionary access rules, handling of views and of user groups, and enforcing multilevel access control. Remote user authentication is necessary since any site of a distributed DBMS may nitiated, and authorized, at remote sites. Three solutions are possible for managing authentication Authentication information is maintained at a central site for global users which can then be authenticated only once and then accessed from multiple sites. The information for authenticating users (user name and password) is replicated at all Intersite communication is thus protected by the use of the site password. Once the initiating site has been authenticated, there is no need for authenticating their remote Query Optimization. Query optimization refers to the process of producing a query execution plan represents an execution strategy for the query. This QEP minimizes an objective cost function. ery optimizer, the software module that performs query optimization, is usually seen as consisting of three components: a search space, a cost model, and a search is the set of alternative execution plans that represent the input q in the sense that they yield the same result, but they differ in the of operations and the way these operations are implemented, and therefore in the cost of a given execution plan. To be accurate, the cost model knowledge about the distributed execution environment. The the search space and selects the best plan, using the cost model. It defines which plans are examined and in which order. The details of the environment (centralized distributed) are captured by the search space and the cost model. 083 The additional problems of access control in a distributed environment stem from the fact that objects and subjects are distributed and that messages with sensitive data can ms are: remote user authentication, management of discretionary access rules, handling of views and of user groups, and enforcing multilevel access control. Remote user authentication is necessary since any site of a distributed DBMS may Authentication information is maintained at a central site for global users which can accessed from multiple sites. The information for authenticating users (user name and password) is replicated at all Intersite communication is thus protected by the use of the site password. Once the enticated, there is no need for authenticating their remote 07 Query optimization refers to the process of producing a query execution plan (QEP) which an objective cost function. query optimization, is usually seen as a cost model, and a search strategy. The alternative execution plans that represent the input query. These in the sense that they yield the same result, but they differ in the of operations and the way these operations are implemented, and therefore in n execution plan. To be accurate, the cost model knowledge about the distributed execution environment. The search strategy the search space and selects the best plan, using the cost model. It defines which which order. The details of the environment (centralized versus
  • 28. Faculty of Degree Engineering Department of CE/IT (07/ 1 Search Space Query execution plans are typically abstracted by means of operator trees order in which the operations are executed. They are enriched such as the best algorithm chosen for each operation. thus be defined as the set of equivalent transformation rules. To characterize query which are operator trees whose permutations of the join order queries. 2 Search Strategy The most popular search strategy used by query optimizers is deterministic. Deterministic strategies proceed by joining one more relation at each step until complete plans are programming builds all possible plans, reduce the optimization cost, partial plans pruned (i.e., discarded) as soon as greedy algorithm, builds 3 Distributed Cost Model An optimizer’s cost model includes cost functions to predict the cost of operators, base data, and formulas to evaluate the sizes of intermediate result execution time, so a cost function represents the execution Faculty of Degree Engineering - 083 Department of CE/IT (07/16) Query execution plans are typically abstracted by means of operator trees order in which the operations are executed. They are enriched with additional information, such as the best algorithm chosen for each operation. For a given query, the search space can thus be defined as the set of equivalent operator trees that can be produced using transformation rules. To characterize query optimizers, it is useful to concentrate on which are operator trees whose operators are join or Cartesian product. Thi permutations of the join order have the most important effect on performance of relational The most popular search strategy used by query optimizers is dynamic programming Deterministic strategies proceed by building plans, starting joining one more relation at each step until complete plans are obtained, as in Figure programming builds all possible plans, breadth first, before it chooses the “best” plan. To reduce the optimization cost, partial plans that are not likely to lead to the optimal plan are (i.e., discarded) as soon as possible. By contrast, another deterministic strategy, the only one plan, depth-first. 3 Distributed Cost Model An optimizer’s cost model includes cost functions to predict the cost of operators, base data, and formulas to evaluate the sizes of intermediate results. The cost is in terms of execution time, so a cost function represents the execution time of a query. *****Best of Luck***** 083 Query execution plans are typically abstracted by means of operator trees , which define the with additional information, For a given query, the search space can operator trees that can be produced using optimizers, it is useful to concentrate on join trees, operators are join or Cartesian product. This is because have the most important effect on performance of relational dynamic programming, which is from base relations, obtained, as in Figure. Dynamic before it chooses the “best” plan. To that are not likely to lead to the optimal plan are possible. By contrast, another deterministic strategy, the An optimizer’s cost model includes cost functions to predict the cost of operators, statistics and The cost is in terms of time of a query.