Distributed databases
Upcoming SlideShare
Loading in...5
×
 

Distributed databases

on

  • 291 views

 

Statistics

Views

Total Views
291
Views on SlideShare
291
Embed Views
0

Actions

Likes
0
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Distributed databases Distributed databases Presentation Transcript

  • Distributed Databases by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004 Dr. Sin-Min Lee
  • Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database systems that run on each site are independent of each other Processing maybe done at a site other than the initiator of request
  • Homogenous Distributed Database Systems  All sites have identical software  They are aware of each other and agree to cooperate in processing user requests  It appears to user as a single system
  • An Homogenous Distributed Database Systems example  A distributed system connects three databases: hq, mfg, and sales  An application can simultaneously access or modify the data in several databases in a single distributed environment.
  • What can we do? A single query from a Manufacturing client on local database mfg can retrieve joined data from the products table on the local database and the dept table on the remote hq database.  For a client application, the location and platform of the databases are transparent.
  • Makes life easier!! For example, if you are connected to database mfg but want to access data on database hq, creating a synonym on mfg for the remote dept table enables you to issue this query: SELECT * FROM dept In this way, a distributed system gives the appearance of native data access.  Users on mfg do not have to know that the data they access resides on remote databases.
  • Heterogeneous Distributed Database System  In a heterogeneous distributed database system, at least one of the databases uses different schemas and software.  A database system having different schema may cause a major problem for query processing.  A database system having different software may cause a major problem for transaction processing.
  • Distributed Data Storage  Replication – System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance.  Fragmentation – Relation is partitioned into several fragments stored in distinct sites Replication and fragmentation can be combined • Relation is partitioned into several fragments: system maintains several identical replicas of each such fragment.
  • Advantages of Replication Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.
  • Disadvantages of Replication Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented. • One solution: choose one copy as primary copy and apply concurrency control operations on primary copy.
  • Fragmentation  Data can be distributed by storing individual tables at different sites  Data can also be distributed by decomposing a table and storing portions at different sites – called Fragmentation  Fragmentation can be horizontal or vertical
  • Why use Fragmentation?  Usage - in general applications use views so it’s appropriate to work with subsets  Efficiency - data stored close to where it is most frequently used  Parallelism - a transaction can divided into several sub-queries to increase degree of concurrency  Security - data more secure - only stored where it is needed Disadvantages: Performance - may be slower Integrity - more difficult
  • Horizontal Fragmentation Each fragment, Ti , of table T contains a subset of the rows Each tuple of T is assigned to one or more fragments. Horizontal fragmentation is lossless
  • Horizontal Fragmentation Example  A bank account schema has a relation Account-schema = (branch-name, account-number, balance).  It fragments the relation by location and stores each fragment locally: rows with branch-name = `Hillside` are stored in the Hillside in a fragment
  • Vertical Fragmentation  Each fragment, Ti, of T contains a subset of the columns, each column is in at least one fragment, and each fragment includes the key: Ti = Πattr_listi (T) T = T1 T2 ….. Tn  All schemas must contain a common candidate key (or superkey) to ensure lossless join property.  A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key.
  • Vertical Fragmentation Example A employee-info schema has a relation employee-info schema = (designation, name, Employee-id, salary). It fragments the relation to put information in two tables for security concern.
  • Commit Protocols  Commit protocols are used to ensure atomicity across sites  Atomicity states that database modifications must follow an “all or nothing” rule.  a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.
  • The Two-Phase Commit (2 PC) Protocol What is this?  Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers.  Two-phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.  With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers. The commit process proceeds as follows:
  • Phase1: Obtaining a Decision Step 1  Coordinator asks all participants to prepare to commit transaction Ti.  Ci adds the records <prepare T> to the log and forces log to stable storage (a log is a file which maintains a record of all changes to the database)  sends prepare T messages to all sites where T executed
  • Phase1: Making a Decision Step 2  Upon receiving message, transaction manager at site determines if it can commit the transaction  if not: add a record <no T> to the log and send abort T message to Ci  if the transaction can be committed, then: 1). add the record <ready T> to the log 2). force all records for T to stable storage 3). send ready T message to Ci
  • Phase 2: Recording the Decision  Step 1  T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted.  Step 2  Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage, it cannot be revoked (even if failures occur)  Step 3  Coordinator sends a message to each participant informing it of the decision (commit or abort)  Step 4  Participants take appropriate action locally.
  • Two-Phase Commit Diagram
  • Costs and Limitations There have been two performance issues with two phase commit: – If one database server is unavailable, none of the servers gets the updates. – This is correctable through network tuning and correctly building the data distribution through database optimization techniques.