R.BOOMADEVI.,M.C.A.,M.E
[A/P] CSE DEPARTMENT
CHRIST THE KING ENGINEERING COLLEGE
COIMBATORE
 A distributed database system consists of
loosely coupled sites that share no physical
component
 Appears to user as a single system
 Database systems that run on each site are
independent of each other
 Processing maybe done at a site other than the
initiator of request
 All sites have identical software
 They are aware of each other and agree to cooperate in
processing user requests
 It appears to user as a single system
 A distributed system connects three databases: hq, mfg, and sales
 An application can simultaneously access or modify the data in several
databases in a single distributed environment.
 In a heterogeneous distributed
database system, at least one of the
databases uses different schemas and
software.
 A database system having different schema may cause a
major problem for query processing.
 A database system having different software may cause a
major problem for transaction processing.
 Replication
◦ System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
 Fragmentation
◦ Relation is partitioned into several fragments stored in
distinct sites
Replication and fragmentation can be combined
• Relation is partitioned into several fragments: system
maintains several identical replicas of each such
fragment.
Availability: failure of site containing relation r does
not result in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several
nodes in parallel.
Reduced data transfer: relation r is available locally
at each site containing a replica of r.
Increased cost of updates: each replica of relation r
must be updated.
Increased complexity of concurrency control:
concurrent updates to distinct replicas may lead to
inconsistent data unless special concurrency control
mechanisms are implemented.
 One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy.
 Data can be distributed by storing individual
tables at different sites
 Data can also be distributed by decomposing a
table and storing portions at different sites –
called Fragmentation
 Fragmentation can be horizontal or vertical
 Usage - in general applications use views so it’s appropriate to
work with subsets
 Efficiency - data stored close to where it is most frequently used
 Parallelism - a transaction can divided into several sub-queries to
increase degree of concurrency
 Security - data more secure - only stored where it is needed
Disadvantages:
Performance - may be slower
Integrity - more difficult
 Each fragment, Ti , of table T contains a
subset of the rows
 Each tuple of T is assigned to one or more
fragments.
 Horizontal fragmentation is lossless
 A bank account schema has a relation
Account-schema = (branch-name, account-number, balance).
 It fragments the relation by location and stores each fragment
locally: rows with branch-name = `Hillside` are stored in the Hillside
in a fragment
 Each fragment, Ti, of T contains a subset of the
columns, each column is in at least one fragment,
and each fragment includes the key:
Ti = Πattr_listi
(T)
T = T1 T2 ….. Tn
All schemas must contain a common candidate key (or
superkey) to ensure lossless join property.
A special attribute, the tuple-id attribute may be added to
each schema to serve as a candidate key.
 A employee-info schema has a relation
employee-info schema = (designation, name,
Employee-id, salary).
 It fragments the relation to put information in two
tables for security concern.
 Commit protocols are used to ensure
atomicity across sites
 Atomicity states that database modifications must
follow an “all or nothing” rule.
 a transaction which executes at multiple sites must
either be committed at all the sites, or aborted at all the
sites.
 What is this?
 Two-phase commit is a transaction protocol designed
for the complications that arise with distributed
resource managers.
 Two-phase commit technology is used for hotel and
airline reservations, stock market transactions, banking
applications, and credit card systems.
 With a two-phase commit protocol, the distributed
transaction manager employs a coordinator to manage
the individual resource managers. The commit process
proceeds as follows:
 Step 1  Coordinator asks all participants to
prepare to commit transaction Ti.
 Ci adds the records <prepare T> to the log and forces
log to stable storage (a log is a file which maintains a
record of all changes to the database)
 sends prepare T messages to all sites where T
executed
 Step 2  Upon receiving message, transaction
manager at site determines if it can commit the
transaction
 if not:
add a record <no T> to the log and send abort T
message to Ci
 if the transaction can be committed, then:
1). add the record <ready T> to the log
2). force all records for T to stable storage
3). send ready T message to Ci
 Step 1  T can be committed of Ci received a ready T
message from all the participating sites: otherwise T
must be aborted.
 Step 2  Coordinator adds a decision record,
<commit T> or <abort T>, to the log and forces record
onto stable storage. Once the record is in stable storage,
it cannot be revoked (even if failures occur)
 Step 3  Coordinator sends a message to each
participant informing it of the decision (commit or abort)
 Step 4  Participants take appropriate action locally.
 There have been two performance issues with two
phase commit:
◦ If one database server is unavailable, none of the
servers gets the updates.
◦ This is correctable through network tuning and correctly
building the data distribution through database
optimization techniques.
 THANK YOU

Distributed databases,types of database

  • 1.
    R.BOOMADEVI.,M.C.A.,M.E [A/P] CSE DEPARTMENT CHRISTTHE KING ENGINEERING COLLEGE COIMBATORE
  • 2.
     A distributeddatabase system consists of loosely coupled sites that share no physical component  Appears to user as a single system  Database systems that run on each site are independent of each other  Processing maybe done at a site other than the initiator of request
  • 3.
     All siteshave identical software  They are aware of each other and agree to cooperate in processing user requests  It appears to user as a single system
  • 4.
     A distributedsystem connects three databases: hq, mfg, and sales  An application can simultaneously access or modify the data in several databases in a single distributed environment.
  • 5.
     In aheterogeneous distributed database system, at least one of the databases uses different schemas and software.  A database system having different schema may cause a major problem for query processing.  A database system having different software may cause a major problem for transaction processing.
  • 6.
     Replication ◦ Systemmaintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance.  Fragmentation ◦ Relation is partitioned into several fragments stored in distinct sites Replication and fragmentation can be combined • Relation is partitioned into several fragments: system maintains several identical replicas of each such fragment.
  • 7.
    Availability: failure ofsite containing relation r does not result in unavailability of r is replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.
  • 8.
    Increased cost ofupdates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.  One solution: choose one copy as primary copy and apply concurrency control operations on primary copy.
  • 9.
     Data canbe distributed by storing individual tables at different sites  Data can also be distributed by decomposing a table and storing portions at different sites – called Fragmentation  Fragmentation can be horizontal or vertical
  • 10.
     Usage -in general applications use views so it’s appropriate to work with subsets  Efficiency - data stored close to where it is most frequently used  Parallelism - a transaction can divided into several sub-queries to increase degree of concurrency  Security - data more secure - only stored where it is needed Disadvantages: Performance - may be slower Integrity - more difficult
  • 11.
     Each fragment,Ti , of table T contains a subset of the rows  Each tuple of T is assigned to one or more fragments.  Horizontal fragmentation is lossless
  • 12.
     A bankaccount schema has a relation Account-schema = (branch-name, account-number, balance).  It fragments the relation by location and stores each fragment locally: rows with branch-name = `Hillside` are stored in the Hillside in a fragment
  • 13.
     Each fragment,Ti, of T contains a subset of the columns, each column is in at least one fragment, and each fragment includes the key: Ti = Πattr_listi (T) T = T1 T2 ….. Tn All schemas must contain a common candidate key (or superkey) to ensure lossless join property. A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key.
  • 14.
     A employee-infoschema has a relation employee-info schema = (designation, name, Employee-id, salary).  It fragments the relation to put information in two tables for security concern.
  • 15.
     Commit protocolsare used to ensure atomicity across sites  Atomicity states that database modifications must follow an “all or nothing” rule.  a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.
  • 16.
     What isthis?  Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers.  Two-phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.  With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers. The commit process proceeds as follows:
  • 17.
     Step 1 Coordinator asks all participants to prepare to commit transaction Ti.  Ci adds the records <prepare T> to the log and forces log to stable storage (a log is a file which maintains a record of all changes to the database)  sends prepare T messages to all sites where T executed
  • 18.
     Step 2 Upon receiving message, transaction manager at site determines if it can commit the transaction  if not: add a record <no T> to the log and send abort T message to Ci  if the transaction can be committed, then: 1). add the record <ready T> to the log 2). force all records for T to stable storage 3). send ready T message to Ci
  • 19.
     Step 1 T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted.  Step 2  Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage, it cannot be revoked (even if failures occur)  Step 3  Coordinator sends a message to each participant informing it of the decision (commit or abort)  Step 4  Participants take appropriate action locally.
  • 21.
     There havebeen two performance issues with two phase commit: ◦ If one database server is unavailable, none of the servers gets the updates. ◦ This is correctable through network tuning and correctly building the data distribution through database optimization techniques.
  • 22.