Your SlideShare is downloading. ×
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Database ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Database ppt

2,074

Published on

Database Presentation

Database Presentation

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,074
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
126
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Parallel and Distributed Databases
    • CS263 Lecture 16
  • 2.  
  • 3.
    • LECTURE PLAN
        • Parallel DBMS - What and Why?
        • What is a Client/Server DBMS?
        • Why do we need Distributed DBMSs?
        • Date’s rules for a Distributed DBMS
        • Benefits of a Distributed DBMS
        • Issues associated with a Distributed DBMS
        • Disadvantages of a Distributed DBMS
  • 4. PARALLEL DATABASE SYSTEM
  • 5.
    • More and More Data!
    • We have databases that hold a high amount of
    • data, in the order of 10 12 bytes:
    • 10,000,000,000,000 bytes !
    • Faster and Faster Access!
    • We have data applications that need to process
    • data at very high speeds:
    • 10,000s transactions per second !
    SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB! PARALLEL DBMSs WHY DO WE NEED THEM?
  • 6.
    • Improves Response Time.
    • INTERQUERY PARALLELISM
    • It is possible to process a number of transactions in
    • parallel with each other.
    • Improves Throughput .
    • INTRAQUERY PARALLELISM
    • It is possible to process ‘sub-tasks’ of a transaction in
    • parallel with each other.
    PARALLEL DBMSs BENEFITS OF A PARALLEL DBMS
  • 7.
    • Speed-Up.
    • As you multiply resources by a certain factor, the time taken
    • to execute a transaction should be reduced by the same factor:
    • 10 seconds to scan a DB of 10,000 records using 1 CPU
    • 1 second to scan a DB of 10,000 records using 10 CPUs
    • Scale-up .
    • As you multiply resources the size of a task that can be executed
    • in a given time should be increased by the same factor.
    • 1 second to scan a DB of 1,000 records using 1 CPU
    • 1 second to scan a DB of 10,000 records using 10 CPUs
    PARALLEL DBMSs HOW TO MEASURE THE BENEFITS
  • 8. Linear speed-up (ideal)‏ Number of CPUs Number of transactions/second Sub-linear speed-up 1000/Sec 5 CPUs 2000/Sec 10 CPUs 16 CPUs 1600/Sec PARALLEL DBMSs SPEED-UP
  • 9. Number of CPUs, Database size Number of transactions/second Linear scale-up (ideal)‏ PARALLEL DBMSs SCALE-UP 10 CPUs 2 GB Database Sub-linear scale-up 1000/Sec 5 CPUs 1 GB Database 900/Sec
  • 10. MEMORY Shared Memory – Parallel Database Architecture X CPU CPU CPU CPU CPU CPU X X X
  • 11. Shared Disk – Parallel Database Architecture CPU CPU CPU CPU CPU CPU M M M M M M X X X
  • 12. Shared Nothing – Parallel Database Architecture CPU M CPU M CPU M CPU M CPU M
  • 13. MAINFRAME DATABASE SYSTEM
  • 14. SPECIALISED NETWORK CONNECTION TERMINALS MAINFRAME COMPUTER PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC DUMB DUMB DUMB
  • 15. CLIENT/SERVER DATABASE SYSTEM
  • 16.
    • CLIENT/SERVER DBMS
        • Manages user interface
        • Accepts user data
        • Processes application/business logic
        • Generates database requests (SQL)‏
        • Transmits database requests to server
        • Receives results from server
        • Formats results according to application logic
        • Present results to the user
    CLIENT PROCESS
  • 17.
    • CLIENT/SERVER DBMS
        • Accepts database requests
        • Processes database requests
          • Performs integrity checks
          • Handles concurrent access
          • Optimises queries
          • Performs security checks
          • Enacts recovery routines
        • Transmits result of database request to client
    SERVER PROCESS
  • 18.     CLIENT/SERVER DBMS ARCHITECTURE PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (FAT CLIENT)‏ SERVER   DBMS NETWORK  Data Request  Data Response CLIENT#1 CLIENT#2 CLIENT#3 D/BASE
  • 19. D/BASE SERVER DBMS       CLIENT/SERVER DBMS ARCHITECTURE PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (THIN CLIENT)‏ PL/SQL NETWORK  Data Request  Data Response CLIENT#1 CLIENT#2 CLIENT#3
  • 20. LAN CLIENT CLIENT LAN CLIENT CLIENT CLIENT CLIENT LAN CLIENT CLIENT LAN CLIENT Leyton CLIENT CLIENT CLIENT Stratford DBMS WIDE AREA NETWORK Barking Leytonstone DISTRIBUTED PROCESSING ARCHITECTURE CLIENT CLIENT CLIENT CLIENT
  • 21. DISTRIBUTED DATABASE SYSTEM
  • 22.
        • A distributed database system is a collection of
        • logically related databases that co-operate in a
        • transparent manner .
        • Transparent implies that each user within the
        • system may access all of the data within all of the
        • databases as if they were a single database
        • There should be ‘location independence’ i.e.- as
        • the user is unaware of where the data is located it
        • is possible to move the data from one physical
        • location to another without affecting the user.
    DISTRIBUTED DATABASES WHAT IS A DISTRIBUTED DATABASE?
  • 23. WIDE AREA NETWORK LAN CLIENT CLIENT CLIENT CLIENT DBMS DISTRIBUTED DATABASE ARCHITECTURE LAN CLIENT CLIENT CLIENT CLIENT DBMS Leytonstone CLIENT CLIENT CLIENT DBMS Stratford CLIENT CLIENT CLIENT DBMS Barking CLIENT CLIENT CLIENT Leyton CLIENT
  • 24. D/BASE SERVER #1 DBMS CLIENT#1 D/BASE SERVER #2 DBMS CLIENT#2 CLIENT#3 M:N CLIENT/SERVER DBMS ARCHITECTURE NOT TRANSPARENT! NETWORK
  • 25. DB Computer Network Site 2 Site 1 GSC DDBMS DC LDBMS GSC DDBMS DC LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS COMPONENTS OF A DDBMS
  • 26.
    • Reduced Communication Overhead
    • Most data access is local, less expensive and performs
    • better .
    • Improved Processing Power
    • Instead of one server handling the full database, we now
    • have a collection of machines handling the same database.
    • Removal of Reliance on a Central Site
    • If a server fails, then the only part of the system that is
    • affected is the relevant local site. The rest of the system
    • remains functional and available.
    DISTRIBUTED DATABASES ADVANTAGES
  • 27.
    • Expandability
    • It is easier to accommodate increasing the size of the
    • global (logical) database.
    • Local autonomy
    • The database is brought nearer to its users. This can effect
    • a cultural change as it allows potentially greater control
    • over local data .
    DISTRIBUTED DATABASES ADVANTAGES
  • 28.
      • A distributed system looks exactly like
      • a non-distributed system to the user!
      • Local autonomy
      • No reliance on a central site
      • Continuous operation
      • Location independence
      • Fragmentation independence
      • Replication independence
      • Distributed query independence
      • Distributed transaction processing
      • Hardware independence
      • Operating system independence
      • Network independence
      • Database independence
    DISTRIBUTED DATABASES DATE’S TWELVE RULES FOR A DDBMS
  • 29.
        • Data Allocation
        • Data Fragmentation
        • Distributed Catalogue Management
        • Distributed Transactions
        • Distributed Queries – (see chapter 20)‏
    DISTRIBUTED DATABASES ISSUES
  • 30.
      • Locality of reference
        • Is the data near to the sites that need it?
      • Reliability and availability
        • Does the strategy improve fault tolerance and accessibility?
      • Performance
        • Does the strategy result in bottlenecks or under-utilisation of resources?
      • Storage costs
        • How does the strategy effect the availability and cost of data storage?
      • Communication costs
        • How much network traffic will result from the strategy?
    DISTRIBUTED DATABASES DATA ALLOCATION METRICS
  • 31.
      • CENTRALISED
    Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Lowest Lowest Lowest Unsatisfactory Highest DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES
  • 32.
      • PARTITIONED/FRAGMENTED
    Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Low (item) – High (system)‏ Lowest Satisfactory Low DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES
  • 33.
      • COMPLETE REPLICATION
    Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Highest Highest Highest High High (update) – Low (read)‏ DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES
  • 34.
      • SELECTIVE REPLICATION
    Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Average Satisfactory Low Low (item) – High (system)‏ DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES
  • 35.
      • Usage
        • Applications are usually interested in ‘views’ not whole relations .
      • Efficiency
        • It’s more efficient if data is close to where it is frequently used.
      • Parallelism
        • It is possible to run several ‘sub-queries’ in tandem.
      • Security
        • Data not required by local applications is not stored at the local
        • site.
    DISTRIBUTED DATABASES WHY FRAGMENT DATA?
  • 36. Horizontal Fragmentation: Consists of a Restriction on a Relation. e.g., (  branch = ‘Stratford’ Account)‏ DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION 333.00 STRATFORD KHAN 456 500.00 BARKING ONO 400 340.14 BARKING GREEN 350 23.17 STRATFORD SMITH 345 200.00 BARKING GRAY 324 1000.00 STRATFORD JONES 200 BALANCE BRANCH CUSTOMER ACCOUNT
  • 37. STRATFORD BRANCH BARKING BRANCH DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION STRATFORD STRATFORD STRATFORD 333.00 KHAN 456 23.17 SMITH 345 1000.00 JONES 200 BALANCE BRANCH CUSTOMER ACCT NO. BARKING BARKING BARKING 500.00 ONO 400 340.14 GREEN 350 200.00 GRAY 324 BALANCE BRANCH CUSTOMER ACCT NO.
  • 38. KJTR78 KHA456T 0208-500-5821 STRATFORD KHAN 456 ZZEE56 GRA324S 0208-545-7528 BARKING GRAY 324 XXYY22 JON200T 0208-500-9000 STRATFORD JONES 200 PASSWORD LOGIN PHONE NO SITE NAME S# Vertical Fragmentation: Consists of a Projection on a Relation. e.g., (  S#, NAME, SITE, PHONE NO Student)‏ DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION
  • 39. KJTR78 ZZEE56 XXYY22 KHA456T 456 GRA324S 324 JON200T 200 PASSWORD LOGIN-ID S# STUDENT ADMINISTRATION NETWORK ADMINISTRATION DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION STRATFORD BARKING STRATFORD KHAN 456 GRAY 324 0208-500-5821 0208-545-7528 0208-500-9000 JONES 200 PHONE NO. SITE NAME S#
  • 40. DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT
    • Centralised Global Catalog
    • One site maintains the full global catalog. All changes to
    • any local system catalog have to be propagated to the site
    • maintaining the global catalog. Bad performance, single
    • point of failure , compromises site autonomy .
    • Dispersed Catalog
    • There is no physical global catalog . Each time a remote
    • data item is required, the catalogues from ALL other sites
    • are examined for the item. This has severe performance
    • penalties .
  • 41. DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT
    • Replicated Global Catalog
    • Each site maintains its own global catalog. Although this
    • greatly speeds up remote data location, it is very
    • inefficient to maintain . A detail of every data item added,
    • changed or deleted locally has to be propagated to ALL
    • other sites .
    • Local-Master Catalog
    • Each site maintains both its local system catalog as well
    • as a catalog of all of its data items that are replicated at
    • other sites. This avoids compromising site autonomy , is
    • fairly efficient , and is not a single point of failure .
  • 42. DISTRIBUTED DATABASES DISTRIBUTED TRANSACTIONS Stratford DB Barking DB Leyton DB Stratford DBMS Stratford Client Stratford Client Stratford Client Barking DBMS Leyton DBMS Global Transaction (a) Debit Stratford A/C £500 (b) Credit Barking A/C £350 (c) Credit Leyton A/C £150 (a)‏ (b)‏ (c)‏ X ATOMIC DISTRIBUTED TRANSACTION
  • 43. TWO-PHASE COMMIT (2PC) - OK
  • 44. TWO-PHASE COMMIT (2PC) - ABORT ‘ Global Abort’
  • 45.
    • Architectural complexity.
    • Cost.
    • Security.
    • Integrity control more difficult.
    • Lack of standards.
    • Lack of experience.
    • Database design more complex.
    DISTRIBUTED DATABASES DISADVANTAGES OF DDBMSs

×