Lecture December 21, 2011

         Distributed Database
         Management System
                                      By
                         Mangesh R. Wanjari
                       Asst. Professor, Department of CSE
         Shri Ramdeobaba College of Engineering and Management, Nagpur
Evolution of DDBMS

        Decentralized database management systems (DDBMS)
           - Interconnected computer systems
           - Data/processing functions reside on multiple sites
        1970’s: Centralized DBMS
        1980’s: Social and Technical Changes
           - Ad hoc capability required
           - Decentralized management structure common
        1990’s: New forces
           - Computational capacity of Personal Computers
           - Internet and the World Wide Web used for data access
              and distribution
           - Data analysis through data mining and data warehousing

  Wednesday, Dcember 21, 2011   Distributed Database Systems     2
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          3
Definition
                     • A distributed database (DDB) is a collection of
                       multiple, logically interrelated databases distributed
                       over a computer network.
                     • A distributed database management system (DDBMS)
                       is the software that manages the DDB and provides an
                       access mechanism that makes this distribution
                       transparent to the users.
                     • Distributed database system (DDBS) = DDB + D–DBMS




  Wednesday, Dcember 21, 2011       Distributed Database Systems                4
Features of Distributed Versus Centralized Databases


              • Centralized Control
              • Data Independence
              • Reduction in Redundancy
              • Complex Physical Structures and efficient
                access
              • Integrity, Recovery, and Concurrency control
              • Privacy and Security




   Wednesday, Dcember 21, 2011   Distributed Database Systems   5
Why Distributed Databases


             •    Organizational and economic reasons
             •    Interconnection of existing databases
             •    Incremental growth
             •    Reduced communication overhead
             •    Performance considerations
             •    Reliability and availability




  Wednesday, Dcember 21, 2011   Distributed Database Systems   6
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          7
Introduction
                     • The traditional database approach keeps all data
                       centrally and then accesses them mostly in a client
                       server model
                     • in a distributed database system data are distributed
                       over site geographically
                     • Say for example there are four branches of a bank at
                       different sites
                     • There will be two types of transaction, one is local
                       transaction and the other is global transaction
                     • In global transaction case the program has to access
                       data over site, which needs much attention, such as,
                       transaction over network, speed, efficient access,
                       integrity, recovery, concurrency control, privacy,
                       security and a lot of things


  Wednesday, Dcember 21, 2011       Distributed Database Systems               8
Centralized Database Management System




  Wednesday, Dcember 21, 2011   Distributed Database Systems   9
Distributed Processing Environment




  Wednesday, Dcember 21, 2011   Distributed Database Systems   10
Distributed Database Environment




  Wednesday, Dcember 21, 2011   Distributed Database Systems   11
Traditional distributed processing architecture

             CLIENT         CLIENT
                                         LAN                       CLIENT     CLIENT

                                                                                        LAN
             CLIENT             CLIENT
                                                                   CLIENT     CLIENT

                                   Nagpur                                              Mumbai

             CLIENT             CLIENT                             CLIENT     CLIENT
                                         LAN                                            LAN




                                                                                                DBMS
             CLIENT             CLIENT                                        CLIENT
                                                                   CLIENT


                                     Delhi                                             Bangalore


  Wednesday, Dcember 21, 2011                  Distributed Database Systems                            12
Distributed Database Architecture

                           CLIEN
                           CLIENT   CLIENT               CLIENT        CLIENT
                           T
                                                                                LAN




                                                                                      DBMS
                   DBMS



                           CLIENT   CLIENT               CLIENT        CLIENT


                                                                                  Mumbai
               Nagpur

                           CLIENT
                           CLIEN    CLIENT               CLIENT        CLIENT
                           T
                                                                                LAN




                                                                                      DBMS
                   DBMS




                           CLIENT   CLIENT               CLIENT        CLIENT


               Delhi                                                            Bangalore

  Wednesday, Dcember 21, 2011           Distributed Database Systems                         13
Distributed Database Management System




 Components of a Distributed DBMS            Possible access methods


  Wednesday, Dcember 21, 2011   Distributed Database Systems           14
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          15
Reference Architecture for Distributed DBMS



                                                     The reference model has two
                                                     main parts

                                                     1. Site independent schemas
                                                     2. Site dependent schemas




  Wednesday, Dcember 21, 2011   Distributed Database Systems                  16
FRAGMENTS AND PHYSICAL IMAGES FOR A GLOBAL RELATION




  Wednesday, Dcember 21, 2011   Distributed Database Systems   17
What is so fascinating about this architecture?

                 The most important three features that motivates in
                 designing this architecture are

                 • Separation of data fragmentation and allocation.

                 • The control of redundancy.

                 • The independence from local DBMS.




  Wednesday, Dcember 21, 2011    Distributed Database Systems          18
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          19
Types of Data Fragmentation

  • Horizontal Fragmentation
  • Vertical Fragmentation
  • Hybrid/Mixed Fragmentation


  There are some rules that must be followed when
  defining fragments:

  • Completeness condition
  • Reconstruction condition
  • Disjointness condition



  Wednesday, Dcember 21, 2011   Distributed Database Systems   20
Horizontal Fragmentation
      Let a global relation be
      SUPPLIER (SUM, NAME, CITY)

      Here the SUPPLIER contains supplier number, supplier name and the city
      where the supplier lives. However if the entire supplier comes from Nagpur
      city (“NGP”) and Mumbai city (“MUM”) then the horizontal fragmentation can
      be defined in the following way:

      SUPPLIER1 = SL CITY = ”NGP” SUPPLIER
      SUPPLIER2 = SL CITY = ”MUM” SUPPLIER

      It is always possible to reconstruct the SUPPLIER global relation through the
      union operation:

      SUPPLIER = SUPPLIER1 UN SUPPLIER2

      q1: CITY=“NGP” AND q2: CITY=“MUM”


 Wednesday, Dcember 21, 2011       Distributed Database Systems                       21
Horizontal Fragmentation CNTD..
                                                              A1       A2       ……….   An
                                                       T1
               A1        A2      ……….          An      T2
T1                                                     T3
                                           1
T2                                                     .
                                           1
T3                                                     .T60
                                           1
.                                          2                                Site 1
.T60
                                           2
T61
                                                              A1       A2     ……….      An
                                           3
                                                       T61
.
                                           3
                                                       .
.
                                           3
                                                       .
Tn
                                                       Tn


                                                                            Site 2
Derived Horizontal Fragmentation
                                                    SUPPLY(SNUM, PNUM, DEPTNUM, QUAN)

                                                    SUPPLY1 =SUPPLY SJ SNUM =SNUM SUPPLIER1

                                                    SUPPLY2 =SUPPLY SJ SNUM =SNUM SUPPLIER2

       Wednesday, Dcember 21, 2011      Distributed Database Systems                          22
VERTICAL FRAGMENTATION
                                                        A1      A2         A3        A4
       Original                        t1                                                       How to Reconstruct:
                             (R)
       Relation                        t2                                                       R=Rs1    Rs2        Rsn

  TID –Tuple ID
Hidden Attribute to
 ensure account                        tn
 and simple join
  reconstruction
                                  A1        A2    TID                TID        A3    A4                RS2
           RS1        t1                            1                  1                   t1
                      t2                            2                  2                   t2
                                                                                                    RS1.TID=RS2.TID
                                                    n                  n
                      tn                                                                   tn      Join condition

                                  SITE1                                          SITE2

    Wednesday, Dcember 21, 2011                  Distributed Database Systems                                  23
VERTICAL FRAGMENTATION
          EMPLOYEE (EMPNUM, SAL, TAX, MGRNUM, DEPTNUM)

          A vertical fragmentation of this relation can be defined as

          EMPLOYEE1 = PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE
          EMPLOYEE2 = PJ EMPNUM, SAL, TAX EMPLOYEE

          The fragmentation could, for instance, reflect an organization in which
          salaries and taxes are managed separately. The reconstruction of relation
          EMPLOYEE can be obtained as

          EMPLOYEE = EMPLOYEE1 JN EMPNUM = EMPNUM EMPLOYEE2




 Wednesday, Dcember 21, 2011        Distributed Database Systems                      24
MIXED FRAGMENTATION
Rs1                                                                              A4   A5
      A1      A2       A3                                            Rs3
                                                                                                u
                                        R                                                       s
                                                                                                a
                                A1      A2       A3       A4        A5



Rs2
      A1     A2       A3                                                                        E
                                                                                 A4   A5        u
                                 (Salary                    (Benefit                            r
                                Attributes)                Attributes)
                                                                                                o
                                                                                                p
                                                                           Rs4                  e
  Wednesday, Dcember 21, 2011        Distributed Database Systems                          25
MIXED FRAGMENTATION
EMPLOYEE (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)

The following is a mixed fragmentation that is obtained by applying the
vertical fragmentation of the previous example, followed by a
horizontal fragmentation on DEPTNUM:

EMPLOYEE1 = SL DEPTNUM <= 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE
EMPLOYEE2 = SL 10 < DEPTNUM <= 20 PJ EMPNUM, NAME, MGRNUM, DEPTNUM
EMPLOYEE
EMPLOYEE3 = SL DEPTNUM > 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE
EMPLOYEE4 = PJ EMPNUM, NAME, SAL, TAX EMPLOYEE

The reconstruction of relation EMPLOYEE is defined by
the following expression:

EMPLOYEE = UN (EMPLOYEE1, EMPLOYEE2, EMPLOYEE3)
JN EMPNUM=EMPNUM PJ EMPNUM, SAL, TAX EMPLOYEE4


  Wednesday, Dcember 21, 2011      Distributed Database Systems           26
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          27
Transparencies as seen by simple application
                           Fragmentation transparency

 Select NAME into $NAME
 from SUPPLIER
 where SNUM=$SNUM


               Location transparency




                      Local Mapping
                      transparency




   Wednesday, Dcember 21, 2011          Distributed Database Systems   28
Transparencies as seen by simple application




     No transparency




   Wednesday, Dcember 21, 2011   Distributed Database Systems   29
Topics left for you from the syllabus



                  • Distributed database access primitives

                  •     Integrity constraints in Distributed databases




   Wednesday, Dcember 21, 2011       Distributed Database Systems        30
Overview

           •   What and why?
           •   The Distributed Database Management Systems
           •   The Reference Architecture for Distributed Databases
           •   Data Fragmentation,
           •   Distributed Transparency
           •   Distributed Database Design




  Wednesday, Dcember 21, 2011   Distributed Database Systems          31
Distributed Database Design

             Any database design has following issues to be
             addressed

             1. Designing the conceptual schema
             2. Designing the physical storage

             Distribution of data also add to this

             3. Designing how to fragment data
             4. Designing how to allocate fragments to sites




  Wednesday, Dcember 21, 2011   Distributed Database Systems   32
Distributed Database Design


         Objectives of the design of data distribution

         •     Process locality
         •     Availability and reliability of distributed data
         •     Workload distribution
         •     Storage cost and availability
         •     Distributed database design




  Wednesday, Dcember 21, 2011   Distributed Database Systems      33
Distributed Database Design

      Two approaches to design

      1. Top-down approach
         • start by designing the global schema, and we proceed by designing the
           fragmentation of the database, and then allocating the fragments to the
           sites, creating the physical images
         • Suitable for systems which are developed from scratch

      2. Bottom-up approach
         • The selection of a common database model for describing the global
           schema of the database.
         • The translation of each local schema into the common data model.
         • The integration of the local schemata into a common global schema.


  Wednesday, Dcember 21, 2011   Distributed Database Systems                 34
The Design of Database Fragmentation

         Horizontal Fragmentation (Primary)

         Let P={p1,p2,…,pn} be a set of simple predicates. In order
         for P to represent fragment correctly and efficiently, P
         must be complete and minimal

         1. We say that a set P is complete iff any two tuples
            belonging to the same fragment are referenced with
            same probability by any application.
         2. We say the set P is minimal if all its predicates are
            relevant.


  Wednesday, Dcember 21, 2011   Distributed Database Systems          35
The Design of Database Fragmentation

      Horizontal Fragmentation (Derived)

      •     A distributed join is a join between horizontally fragmented relations
           which is represented by Join Graphs
      •    Join Graphs
            • Total
            • Reduced
                 • Simple
                 • Partitioned




      Derived Fragments : Ri=Si SJF R

  Wednesday, Dcember 21, 2011       Distributed Database Systems                     36
The Design of Database Fragmentation


       Vertical Fragmentation

       1. Split approach
       2. Grouping approach




  Wednesday, Dcember 21, 2011   Distributed Database Systems   37
The Design of Database Fragmentation


       Mixed Fragmentation

       1. Applying Vertical Fragmentation to Horizontal
          fragments

       2. Applying Horizontal Fragmentation to Vertical
          fragments




  Wednesday, Dcember 21, 2011   Distributed Database Systems   38
The Allocation of Fragments

   General criteria for fragment allocation

   1. Redundant
   2. Non-Redundant

   If replicated complexity is high because
   1. The degree of replication of each fragment becomes a
       variable of the problem.
   2. Modeling read applications is complicated by the fact that
       the applications can now select among several alternative
       sites for accessing fragments



  Wednesday, Dcember 21, 2011   Distributed Database Systems   39
The Allocation of Fragments

          For determining the redundant allocation of fragments,
          either of the following methods can be used:

          1. All beneficial sites: In this approach the set of all sites
             where the benefit of allocation one copy of the fragment
             is higher than the cost, and allocate a copy of the
             fragment to each element of this set.

          1. Additional replication: Here first the solution of the non
             replicated problem, and then progressively introduce
             replicated copies starting from the most beneficial; the
             process is terminated when no additional replication is
             beneficial.



  Wednesday, Dcember 21, 2011     Distributed Database Systems             40
Measure of costs and benefits of fragment allocation

          Some Definitions

          • i is the fragment index
          • j is the site index
          • k is the application index
          • fkj is the frequency of application k at site j
          • rki is the number of retrieval references of
            application k to fragment I
          • uki is the number of update references of
            application k to fragment I
          • nki = rki - uki



  Wednesday, Dcember 21, 2011   Distributed Database Systems   41
Measure of costs and benefits of fragment allocation
    Horizontal fragmentation:

    1. Using the ‘best-fit’ approach for a non-replicated allocation, we place Ri at the
        site where the number of references to Ri is maximum. The number of local
        references of Ri at site j is
                                Bij = ∑k fkj nki
                 Ri is allocated at site j* such that Bij* is maximum.

    2. Using the ‘all beneficial sites’ method for replicated allocation, we place Ri at
       all sites j where the cost of retrieval references of applications is larger than
       the cost of update references to Ri from applications at any other site. Bij is
       evaluated as the difference:
                                Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki
         C is a constant which measures the ratio between the cost of an update and
         retrieval access

  Wednesday, Dcember 21, 2011               Distributed Database Systems                   42
Measure of costs and benefits of fragment allocation


   3. Using the ‘additional replication’ method for replicated allocation, we can
      measure the benefit of placing a new copy of Ri in terms of increased
      reliability and availability of the system.

   Let di denote the degree of redundancy of Ri, and let Fi denote the benefit of
   having Ri fully replicated at each site. The following function was introduced to
   measure this benefit:
                                     β(di) = (1 – 21-di) Fi
   Note that β(1) = 0, β(2) = Fi / 2, β(3) = 3 Fi / 4, and so on.
   We evaluate the benefit of introducing a new copy of Ri at site j by modifying the
   formula of case 2 as follows:
                            Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki + β(di)



  Wednesday, Dcember 21, 2011           Distributed Database Systems                    43
References


             1. “Distributed databases Principals & Systems”, Stefano Ceri,
                Ginseppe Pelagatti, McGrawHill Book Company, 1984.

             2. ”Database System Concepts”, Abraham Silberschatz, Henry F.
                Korth, S. Sudarshan, Third Edition,The McGraw Hill
                Companies, Inc, 1997.

             3. Database Systems- Design, Implementation and Management;
                Peter Rob, Carlos Coronnel; Course Technology; 2000

             4. Principles of Distributed Database Systems , M. T. Özsu and P.
                Valduriez, 3rd edition, Springer, 2011




  Wednesday, Dcember 21, 2011        Distributed Database Systems                44
Motivation is what gets you started and Habit is what keeps you going…




                         Thanks a lot for patient listening!!

                                     Questions?

                                You can reach me at
                            mangeshwanjari[at]gmail.com



   Wednesday, Dcember 21, 2011      Distributed Database Systems   45

Lecture 1 ddbms

  • 1.
    Lecture December 21,2011 Distributed Database Management System By Mangesh R. Wanjari Asst. Professor, Department of CSE Shri Ramdeobaba College of Engineering and Management, Nagpur
  • 2.
    Evolution of DDBMS Decentralized database management systems (DDBMS) - Interconnected computer systems - Data/processing functions reside on multiple sites 1970’s: Centralized DBMS 1980’s: Social and Technical Changes - Ad hoc capability required - Decentralized management structure common 1990’s: New forces - Computational capacity of Personal Computers - Internet and the World Wide Web used for data access and distribution - Data analysis through data mining and data warehousing Wednesday, Dcember 21, 2011 Distributed Database Systems 2
  • 3.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 3
  • 4.
    Definition • A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. • A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. • Distributed database system (DDBS) = DDB + D–DBMS Wednesday, Dcember 21, 2011 Distributed Database Systems 4
  • 5.
    Features of DistributedVersus Centralized Databases • Centralized Control • Data Independence • Reduction in Redundancy • Complex Physical Structures and efficient access • Integrity, Recovery, and Concurrency control • Privacy and Security Wednesday, Dcember 21, 2011 Distributed Database Systems 5
  • 6.
    Why Distributed Databases • Organizational and economic reasons • Interconnection of existing databases • Incremental growth • Reduced communication overhead • Performance considerations • Reliability and availability Wednesday, Dcember 21, 2011 Distributed Database Systems 6
  • 7.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 7
  • 8.
    Introduction • The traditional database approach keeps all data centrally and then accesses them mostly in a client server model • in a distributed database system data are distributed over site geographically • Say for example there are four branches of a bank at different sites • There will be two types of transaction, one is local transaction and the other is global transaction • In global transaction case the program has to access data over site, which needs much attention, such as, transaction over network, speed, efficient access, integrity, recovery, concurrency control, privacy, security and a lot of things Wednesday, Dcember 21, 2011 Distributed Database Systems 8
  • 9.
    Centralized Database ManagementSystem Wednesday, Dcember 21, 2011 Distributed Database Systems 9
  • 10.
    Distributed Processing Environment Wednesday, Dcember 21, 2011 Distributed Database Systems 10
  • 11.
    Distributed Database Environment Wednesday, Dcember 21, 2011 Distributed Database Systems 11
  • 12.
    Traditional distributed processingarchitecture CLIENT CLIENT LAN CLIENT CLIENT LAN CLIENT CLIENT CLIENT CLIENT Nagpur Mumbai CLIENT CLIENT CLIENT CLIENT LAN LAN DBMS CLIENT CLIENT CLIENT CLIENT Delhi Bangalore Wednesday, Dcember 21, 2011 Distributed Database Systems 12
  • 13.
    Distributed Database Architecture CLIEN CLIENT CLIENT CLIENT CLIENT T LAN DBMS DBMS CLIENT CLIENT CLIENT CLIENT Mumbai Nagpur CLIENT CLIEN CLIENT CLIENT CLIENT T LAN DBMS DBMS CLIENT CLIENT CLIENT CLIENT Delhi Bangalore Wednesday, Dcember 21, 2011 Distributed Database Systems 13
  • 14.
    Distributed Database ManagementSystem Components of a Distributed DBMS Possible access methods Wednesday, Dcember 21, 2011 Distributed Database Systems 14
  • 15.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 15
  • 16.
    Reference Architecture forDistributed DBMS The reference model has two main parts 1. Site independent schemas 2. Site dependent schemas Wednesday, Dcember 21, 2011 Distributed Database Systems 16
  • 17.
    FRAGMENTS AND PHYSICALIMAGES FOR A GLOBAL RELATION Wednesday, Dcember 21, 2011 Distributed Database Systems 17
  • 18.
    What is sofascinating about this architecture? The most important three features that motivates in designing this architecture are • Separation of data fragmentation and allocation. • The control of redundancy. • The independence from local DBMS. Wednesday, Dcember 21, 2011 Distributed Database Systems 18
  • 19.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 19
  • 20.
    Types of DataFragmentation • Horizontal Fragmentation • Vertical Fragmentation • Hybrid/Mixed Fragmentation There are some rules that must be followed when defining fragments: • Completeness condition • Reconstruction condition • Disjointness condition Wednesday, Dcember 21, 2011 Distributed Database Systems 20
  • 21.
    Horizontal Fragmentation Let a global relation be SUPPLIER (SUM, NAME, CITY) Here the SUPPLIER contains supplier number, supplier name and the city where the supplier lives. However if the entire supplier comes from Nagpur city (“NGP”) and Mumbai city (“MUM”) then the horizontal fragmentation can be defined in the following way: SUPPLIER1 = SL CITY = ”NGP” SUPPLIER SUPPLIER2 = SL CITY = ”MUM” SUPPLIER It is always possible to reconstruct the SUPPLIER global relation through the union operation: SUPPLIER = SUPPLIER1 UN SUPPLIER2 q1: CITY=“NGP” AND q2: CITY=“MUM” Wednesday, Dcember 21, 2011 Distributed Database Systems 21
  • 22.
    Horizontal Fragmentation CNTD.. A1 A2 ………. An T1 A1 A2 ………. An T2 T1 T3 1 T2 . 1 T3 .T60 1 . 2 Site 1 .T60 2 T61 A1 A2 ………. An 3 T61 . 3 . . 3 . Tn Tn Site 2 Derived Horizontal Fragmentation SUPPLY(SNUM, PNUM, DEPTNUM, QUAN) SUPPLY1 =SUPPLY SJ SNUM =SNUM SUPPLIER1 SUPPLY2 =SUPPLY SJ SNUM =SNUM SUPPLIER2 Wednesday, Dcember 21, 2011 Distributed Database Systems 22
  • 23.
    VERTICAL FRAGMENTATION A1 A2 A3 A4 Original t1 How to Reconstruct: (R) Relation t2 R=Rs1 Rs2 Rsn TID –Tuple ID Hidden Attribute to ensure account tn and simple join reconstruction A1 A2 TID TID A3 A4 RS2 RS1 t1 1 1 t1 t2 2 2 t2 RS1.TID=RS2.TID n n tn tn Join condition SITE1 SITE2 Wednesday, Dcember 21, 2011 Distributed Database Systems 23
  • 24.
    VERTICAL FRAGMENTATION EMPLOYEE (EMPNUM, SAL, TAX, MGRNUM, DEPTNUM) A vertical fragmentation of this relation can be defined as EMPLOYEE1 = PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE EMPLOYEE2 = PJ EMPNUM, SAL, TAX EMPLOYEE The fragmentation could, for instance, reflect an organization in which salaries and taxes are managed separately. The reconstruction of relation EMPLOYEE can be obtained as EMPLOYEE = EMPLOYEE1 JN EMPNUM = EMPNUM EMPLOYEE2 Wednesday, Dcember 21, 2011 Distributed Database Systems 24
  • 25.
    MIXED FRAGMENTATION Rs1 A4 A5 A1 A2 A3 Rs3 u R s a A1 A2 A3 A4 A5 Rs2 A1 A2 A3 E A4 A5 u (Salary (Benefit r Attributes) Attributes) o p Rs4 e Wednesday, Dcember 21, 2011 Distributed Database Systems 25
  • 26.
    MIXED FRAGMENTATION EMPLOYEE (EMPNUM,NAME, SAL, TAX, MGRNUM, DEPTNUM) The following is a mixed fragmentation that is obtained by applying the vertical fragmentation of the previous example, followed by a horizontal fragmentation on DEPTNUM: EMPLOYEE1 = SL DEPTNUM <= 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE EMPLOYEE2 = SL 10 < DEPTNUM <= 20 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE EMPLOYEE3 = SL DEPTNUM > 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEE EMPLOYEE4 = PJ EMPNUM, NAME, SAL, TAX EMPLOYEE The reconstruction of relation EMPLOYEE is defined by the following expression: EMPLOYEE = UN (EMPLOYEE1, EMPLOYEE2, EMPLOYEE3) JN EMPNUM=EMPNUM PJ EMPNUM, SAL, TAX EMPLOYEE4 Wednesday, Dcember 21, 2011 Distributed Database Systems 26
  • 27.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 27
  • 28.
    Transparencies as seenby simple application Fragmentation transparency Select NAME into $NAME from SUPPLIER where SNUM=$SNUM Location transparency Local Mapping transparency Wednesday, Dcember 21, 2011 Distributed Database Systems 28
  • 29.
    Transparencies as seenby simple application No transparency Wednesday, Dcember 21, 2011 Distributed Database Systems 29
  • 30.
    Topics left foryou from the syllabus • Distributed database access primitives • Integrity constraints in Distributed databases Wednesday, Dcember 21, 2011 Distributed Database Systems 30
  • 31.
    Overview • What and why? • The Distributed Database Management Systems • The Reference Architecture for Distributed Databases • Data Fragmentation, • Distributed Transparency • Distributed Database Design Wednesday, Dcember 21, 2011 Distributed Database Systems 31
  • 32.
    Distributed Database Design Any database design has following issues to be addressed 1. Designing the conceptual schema 2. Designing the physical storage Distribution of data also add to this 3. Designing how to fragment data 4. Designing how to allocate fragments to sites Wednesday, Dcember 21, 2011 Distributed Database Systems 32
  • 33.
    Distributed Database Design Objectives of the design of data distribution • Process locality • Availability and reliability of distributed data • Workload distribution • Storage cost and availability • Distributed database design Wednesday, Dcember 21, 2011 Distributed Database Systems 33
  • 34.
    Distributed Database Design Two approaches to design 1. Top-down approach • start by designing the global schema, and we proceed by designing the fragmentation of the database, and then allocating the fragments to the sites, creating the physical images • Suitable for systems which are developed from scratch 2. Bottom-up approach • The selection of a common database model for describing the global schema of the database. • The translation of each local schema into the common data model. • The integration of the local schemata into a common global schema. Wednesday, Dcember 21, 2011 Distributed Database Systems 34
  • 35.
    The Design ofDatabase Fragmentation Horizontal Fragmentation (Primary) Let P={p1,p2,…,pn} be a set of simple predicates. In order for P to represent fragment correctly and efficiently, P must be complete and minimal 1. We say that a set P is complete iff any two tuples belonging to the same fragment are referenced with same probability by any application. 2. We say the set P is minimal if all its predicates are relevant. Wednesday, Dcember 21, 2011 Distributed Database Systems 35
  • 36.
    The Design ofDatabase Fragmentation Horizontal Fragmentation (Derived) • A distributed join is a join between horizontally fragmented relations which is represented by Join Graphs • Join Graphs • Total • Reduced • Simple • Partitioned Derived Fragments : Ri=Si SJF R Wednesday, Dcember 21, 2011 Distributed Database Systems 36
  • 37.
    The Design ofDatabase Fragmentation Vertical Fragmentation 1. Split approach 2. Grouping approach Wednesday, Dcember 21, 2011 Distributed Database Systems 37
  • 38.
    The Design ofDatabase Fragmentation Mixed Fragmentation 1. Applying Vertical Fragmentation to Horizontal fragments 2. Applying Horizontal Fragmentation to Vertical fragments Wednesday, Dcember 21, 2011 Distributed Database Systems 38
  • 39.
    The Allocation ofFragments General criteria for fragment allocation 1. Redundant 2. Non-Redundant If replicated complexity is high because 1. The degree of replication of each fragment becomes a variable of the problem. 2. Modeling read applications is complicated by the fact that the applications can now select among several alternative sites for accessing fragments Wednesday, Dcember 21, 2011 Distributed Database Systems 39
  • 40.
    The Allocation ofFragments For determining the redundant allocation of fragments, either of the following methods can be used: 1. All beneficial sites: In this approach the set of all sites where the benefit of allocation one copy of the fragment is higher than the cost, and allocate a copy of the fragment to each element of this set. 1. Additional replication: Here first the solution of the non replicated problem, and then progressively introduce replicated copies starting from the most beneficial; the process is terminated when no additional replication is beneficial. Wednesday, Dcember 21, 2011 Distributed Database Systems 40
  • 41.
    Measure of costsand benefits of fragment allocation Some Definitions • i is the fragment index • j is the site index • k is the application index • fkj is the frequency of application k at site j • rki is the number of retrieval references of application k to fragment I • uki is the number of update references of application k to fragment I • nki = rki - uki Wednesday, Dcember 21, 2011 Distributed Database Systems 41
  • 42.
    Measure of costsand benefits of fragment allocation Horizontal fragmentation: 1. Using the ‘best-fit’ approach for a non-replicated allocation, we place Ri at the site where the number of references to Ri is maximum. The number of local references of Ri at site j is Bij = ∑k fkj nki Ri is allocated at site j* such that Bij* is maximum. 2. Using the ‘all beneficial sites’ method for replicated allocation, we place Ri at all sites j where the cost of retrieval references of applications is larger than the cost of update references to Ri from applications at any other site. Bij is evaluated as the difference: Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki C is a constant which measures the ratio between the cost of an update and retrieval access Wednesday, Dcember 21, 2011 Distributed Database Systems 42
  • 43.
    Measure of costsand benefits of fragment allocation 3. Using the ‘additional replication’ method for replicated allocation, we can measure the benefit of placing a new copy of Ri in terms of increased reliability and availability of the system. Let di denote the degree of redundancy of Ri, and let Fi denote the benefit of having Ri fully replicated at each site. The following function was introduced to measure this benefit: β(di) = (1 – 21-di) Fi Note that β(1) = 0, β(2) = Fi / 2, β(3) = 3 Fi / 4, and so on. We evaluate the benefit of introducing a new copy of Ri at site j by modifying the formula of case 2 as follows: Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki + β(di) Wednesday, Dcember 21, 2011 Distributed Database Systems 43
  • 44.
    References 1. “Distributed databases Principals & Systems”, Stefano Ceri, Ginseppe Pelagatti, McGrawHill Book Company, 1984. 2. ”Database System Concepts”, Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Third Edition,The McGraw Hill Companies, Inc, 1997. 3. Database Systems- Design, Implementation and Management; Peter Rob, Carlos Coronnel; Course Technology; 2000 4. Principles of Distributed Database Systems , M. T. Özsu and P. Valduriez, 3rd edition, Springer, 2011 Wednesday, Dcember 21, 2011 Distributed Database Systems 44
  • 45.
    Motivation is whatgets you started and Habit is what keeps you going… Thanks a lot for patient listening!! Questions? You can reach me at mangeshwanjari[at]gmail.com Wednesday, Dcember 21, 2011 Distributed Database Systems 45