SlideShare a Scribd company logo
1 of 34
CS 222
Database Management System
             Spring 2010-11


           Lecture 5
Database Design (Decomposition)

            Korra Sathya Babu
         Department of Computer Science
                 NIT Rourkela
Recap
•       Design of DB is needed to reduce redundancy and
        anomalies
•       The theory of Functional Dependency is completely
        studied
•       Better Design requires schema refinement
•       A solution for schema refinement is Synthesis of
        relations




    2/11/2013              Database Design                  2
Relation Decomposition




                                 R1



            R-X +        X            X +- X


                    R2
                             R



2/11/2013                Database Design       3
Relation Decomposition
•       Reason for Decomposition
        •       A solution for reducing redundancy and Anomalies


•       Rules for synthesis
        •       Lossless Join (Information Preservation)

        •       Dependency Preservation (a special case of information
                preservation)


•       Decomposition (synthesis) types
        •       By functional dependency
        •       By multi-valued dependency
        •       By Join dependency


    2/11/2013                        Database Design                     4
Lossless Join
•       Definition
                A decomposition D = {R1, R2,..., Rm} of R has the lossless
                join property with respect to the set of dependencies F on
                R if, for every relation r of R that satisfies F, the following
                holds,           ( R1(r), ..., Rm(r)) = r

                where   is the natural join of all the relations in D



•       The word loss in lossless refers to loss of
        information, not to loss of tuples.




    2/11/2013                              Database Design                        5
Test for Lossless Join
Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of
Functional Dependencies

Lossless Join Test Algorithm:
Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one
         column j for each attribute Aj in R.
Step 2: Set S(i, j) := bij for all matrix entries
Step 3: For each row i representing relation schema Ri Do
                           {for each column j representing Aj do
                                    {if relation Ri includes attribute Aj then
                                               set S(i, j) := aj;}
Step 4: Repeat the following loop until a complete loop execution results in no
                           changes to S.




 2/11/2013                             Database Design                                 6
Test for Lossless Join
Lossless Join Test Algorithm: continues…
Step 4: Repeat the following loop until a complete loop execution results in no
                                     changes to S.
         If {for each function dependency X      Y in F do
                  for all rows in S which have the same symbols in the columns
                   corresponding to attributes in X do
                  {make the symbols in each column that correspond to
                  an attribute in Y be the same in all these rows as follows:
                  if any of the rows has an “a” symbol for the column,
                  set the other rows to the same “a” symbol in the column.
                  If no “a” symbol exists for the attribute in any of the
                  rows, choose one of the “b” symbols that appear in one
                  of the rows for the attribute and set the other rows to
                  that same “b” symbol in the column;}}

Step 5: If a row is made up entirely of “a” symbols, then the
                 decomposition has the lossless join property;
                 otherwise it does not.

2/11/2013                         Database Design                                 7
Example 1

    Emp_PROJ


       SSN        PNUM      hours       ENAME         PNAME PLOCATION

     F = {SSN     ENAME, PNUM    {PNAME, PLOCATION}, {SSN, PNUM}   hours}




      R1                            R2
            SSN    ENAME            PNUM PNAME PLOCATION

                      R3
                           SSN      PNUM           hours

2/11/2013                        Database Design                        8
Example 1

              A1     A2     A3         A4        A5        A6
             SSN   ENAME   PNUM      PNAME    PLOCATION   hours

        R1   b11    b12     b13        b14       b15      b16
        R2   b21    b22     b23        b24       b25      b26

        R3   b31    b32     b33        b34       b35      b36




        R1   a1     a2      b13        b14       b15      b16

        R2   b21    b22     a3         a4        a5       b26

        R3   a1     b32     a3         b34       b35      a6




2/11/2013                   Database Design                       9
Example 1
                                         SSN    ENAME
             SSN   ENAME
        R1   a1     a2      b13         b14        b15      b16

        R2   b21    b22     a3          a4         a5       b26

        R3   a1     a2      a3          b34        b35      a6


                                    PNUM       {PNAME, PLOCATION}

                           PNUM     PNAME       PLOCATION
        R1   a1     a2      b13        b14         b15      b16

        R2   b21    b22     a3         a4          a5       b26

        R3   a1     a2      a3         a4          a5       a6



2/11/2013                   Database Design                         10
Example 2

  Emp_PROJ

       SSN       PNUM    hours         ENAME       PNAME PLOCATION
      F = {SSN   ENAME, PNUM   {PNAME, PLOCATION}, {SSN, PNUM}        hours}




 R1                               R2

 ENAME      PLOCATION            SSN    PNUM     hours   PNAME   PLOCATION




2/11/2013                      Database Design                               11
Example 2

              A1     A2     A3         A4        A5        A6
             SSN   ENAME   PNUM      PNAME    PLOCATION   hours

        R1   b11    b12     b13        b14       b15      b16
        R2   b21    b22     b23        b24       b25      b26




        R1   b11    a2      b13        b14       a5       b16
        R2   a1     b22     a3         a4        a5       a6

                                     SSN    ENAME
                                     PNUM    {PNAME, PLOCATION}
                                     {SSN, PNUM}   hours


2/11/2013                   Database Design                       12
Problems
•       Check whether the following decompositions are
        lossy or lossless
        •       Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.
                Let F={AC, BC, CD, DEC, CEA}
        •       R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.
                R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)
        •       R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)
        •       R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}
                F={XYW, XWP, PQZ, XYQ}




    2/11/2013                    Database Design                   13
Dependency Preservation
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for R
S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers
   to only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)

the decomposition is dependency preserving if
                                       S’+ = S+




2/11/2013               Database Design                    14
Test for Dependency Preservation
Input: decomposition D={D1,…,Dk} and a set of FDs F
Dependency Preservation Test:
Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the
determinant of the FD under consideration). ie set T=X and continue with step 2
Step 2: Repeat step 3 until the set T no longer changes. When T no longer
changes continue with step 4
Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply
the corresponding Ri operation (on a set of attributes T with respect to set
of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3
Step 4: Test to see if Y(the right hand side of the FD under consideration)
is such that Y ⊂ T. There are two outcomes to this test. If the answer is
negative. i.e. if Y not a subset of T then stop the execution of the
algorithm and report that the decomposition does not preserve the FD. If
the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other
FDs in F that need to be considered repeat step 1 with a FD that has not
been considered before. If no more FDs in F then continue with step 4

2/11/2013                            Database Design                                  15
Problems

1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the
   decomposition R1(XY) and R2(XZ) preserve the set F.
2. Given R(ABCD) and the set F = {AB , CD}. Check if the
   decomposition R1(AB) and R2(CD) preserve the set F.
3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the
   relation R(WXYZ) preserves the dependencies of the set F={XY,
   YZ, ZW, WX}.
4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check
   if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve
   the set F.




2/11/2013                   Database Design                       16
Normalization
•       Normalization is the process of successive reduction
        of a given set of relations to a better form (reduced
        redundancy and anomalies)
•       The normalization that one needs to sustain
        depends on the work flow (tradeoff between fast
        access, maintenance of integrity)
•       Assumes that all possible functional dependencies
        are known
        •       First construct a minimal set of FDs
        •       Then apply algorithms that construct a required Normal
                Form
•       Additional criteria may be needed to ensure that the
        set of relations in a relational database are
        atisfactory
    2/11/2013                       Database Design                      17
1 NF
•       A relation is in first normal form (1NF) if it does not
        contain any repeating columns or repeating groups
        of columns
•       It is the process of converting complex data
        structures into more simple, stable data structures
•       A relvar is in 1NF if and only if in every legal value
        of that relvar, every tuple contains exactly one
        value for each attribute
•       First Normal From (1NF)
        •       Unique rows
        •       All attributes are atomic




    2/11/2013                        Database Design          18
2 NF
•       A table is in the second normal form (2NF) if it is in
        the first normal form and if all non-key columns in
        the table depend on the entire primary key
•       The following relation is in 1NF but not 2NF

      EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)

      Functional dependencies:
      1. Emp_ID  Name, Dept, Salary                     partial key dependency
      2. Emp_ID, Course  Date_Completed


       Decompose into 2NF
       EMPLOYEE1(Emp_ID, Name, Dept, Salary)
       Functional dependencies: Emp_ID Name, Dept, Salary

       EMPCOURSE(Emp_ID, Course,Date_Completed)
       Functional dependency: Emp_ID, Course  Date_Completed


    2/11/2013                          Database Design                            19
3 NF
•       A table is in the third normal form (3NF) if it is in
        the second normal form and if all non-key columns
        in the table depend non-transitively on the entire
        primary key

      SALES(Customer_ID, Customer_Name, SalesPerson, Region)
      Functional dependencies:
      1. Customer_ID  Customer_Name, SalesPerson, Region
      2. SalesPerson  Region                                   Transitive Dependency



      Decompose into 3NF
      SALES1(Customer_ID, Customer_Name, SalesPerson)
      Functional dependencies: Customer_ID Customer_Name, SalesPerson

      SPERSON(SalesPerson, Region)
      Functional dependency: SalesPerson  Region



    2/11/2013                          Database Design                                  20
BCNF
•       A table is in Boyce-Codd normal form (BCNF) if
        every column, on which some other column is fully
        functionally dependent, is also a candidate for the
        primary key of the table
•       A table is in BCNF if the only determinants in the
        table are the candidate keys
      SCHOOL(Student, Subject, Teacher)
      Functional dependencies:
      1. Student, Subject  Teacher
      2. Student, Teacher  Subject
      3. Teacher  Subject

      Decompose into BCNF
      SCHOOL1(Student, Subject)
      SCHOOL2(Subject, Teacher)

      All Functional Dependencies vanished except TeacherSubject

    2/11/2013                             Database Design           21
Comparison between 3NF and BCNF

•       It is always possible to decompose a relation into
        relations in 3NF such that:
               the decomposition is lossless
               the dependencies are preserved


•       It is always possible to decompose a relation into
        relations in BCNF such that:
               the decomposition is lossless
               but it may not be possible to preserve dependencies
               But may eliminate more redundancy




    2/11/2013                       Database Design                   22
Multivalued Dependency
        Let R be a relation schema and let         R and   R. The
        multivalued dependency

                 holds on R if in any legal relation r(R), for all pairs for
tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4
in r such that:
                  t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
                  t3[ ] = t1 [ ]
                  t3[R – ] = t2[R – ]
                  t4 ] = t2[ ]
                  t4[R – ] = t1[R – ]

•       MVD is a tuple generating Dependency

    2/11/2013                    Database Design                           23
4 NF
•       A table is in the fourth normal form (4 NF) if it is in
        BCNF and does not have any independent multi-
        valued parts of the primary key
•       If there are two attributes A and B and for a given
        value of A if there exists multiple values of B, then
        we say that an MVD exists between A and B
•       The normal forms after BCNF are theoretical
        interests




    2/11/2013                Database Design                  24
4 NF
    Student Table
            Student       Subject               Language
            Geeta        Mythology               English
            Geeta        Psychology              English
            Geeta        Mythology               Hindi
            Geeta        Psychology              Hindi
            Shekher      Gardening               English


      Student         Subject
      Student         Language




2/11/2013                     Database Design              25
4 NF

 Split the independent multi-valued components of the
 primary key into two tables
 The primary key is (student subject language)

  Student_Subject Table                     Student_Language Table
        Student       Subject                     Student      Language
            Geeta    Mythology                    Geeta         English
            Geeta    Psychology                   Geeta         Hindi
        Shekher      Gardening                    Shekher       English




      Here we take care of the update anomaly


2/11/2013                       Database Design                           26
Surprise: Loss less Decomposition
•       There exists relations that cannot be nonloss-
        decomposed into two projects, but can be
        decomposed into three or more




    2/11/2013               Database Design              27
Join Dependency
•       Definition: A relation R satisfies the join
        Dependency (JD) *(X,Y,…,Z)
                    iff R is equal to the join of its projects on
        X,Y,..,Z, where X,Y,..,Z are subsets of the set of
        attributes of R.
•       Consider the following Suppliers(S), Parts(P) and Location they
        Supply (L) table
        SPL Table
            S       P    L
                                              S        P    P    L
            S1      P1   L2
                                 ACTUAL       S1       P1   P1   L2
                              DECOMPOSTION
            S1      P2   L1
                                              S1       P2   P2   L1
            S2      P1   L1
                                              S2       P1   P1   L1
            S1      P1   L1



    2/11/2013                        Database Design                      28
Join Dependency

        S    P    L
                                       S        P                 P         L
        S1   P1   L2
                          ACTUAL       S1       P1                P1        L2
                       DECOMPOSTION
        S1   P2   L1
                                       S1       P2                P2        L1
        S2   P1   L1
                                       S2       P1                P1        L1
        S1   P1   L1
                                                      Join

                                                     S       P         L
                                                     S1      P1        L2
                                                     S1      P2        L1
                                                     S2      P1        L1
                                                     S1      P1        L1
                  Spurious                           S2      P1        L2
                  Tuple
2/11/2013                     Database Design                                    29
Join Dependency

        S    P    L
                                       S        P         P        L        L    S
        S1   P1   L2
                       DECOMPOSTION    S1       P1       P1       L2        L2   S1
        S1   P2   L1
                                       S1       P2       P2       L1        L1   S1
        S2   P1   L1
                                       S2       P1       P1       L1        L2   S2
        S1   P1   L1
                                                          Join

                                                     S        P        L
                                                     S1       P1       L2
                                                     S1       P2       L1
                                                     S2       P1       L1
                                                     S1       P1       L1




2/11/2013                     Database Design                                        30
5 NF
•       A table is in fifth normal form (5NF) if it is in the
        fourth normal form and every join dependency in
        the table is implied by the candidate key
•       Its also called as the Project Join Normal Form
        (PJNF)




    2/11/2013                 Database Design                   31
Normalization
             Un-normalized Relation
                                                         Arrange every atomic value in the cell
                                                         (intersection of row and column) of a table

             First Normal Form (1NF)
                                                         Eliminate Partial Dependencies

            Second Normal Form (2NF)
                                                         Eliminate Transitive Dependencies

             Third Normal Form (3NF)
                                                         Make every determinant as a key

            Boyce-Codd Normal Form
                                                         Eliminate Multi-valued Dependencies
                                                         that are not Functional Dependencies
            Fourth Normal Form (4NF)
                                                         Eliminate Join Dependencies that are not
                                                         implied by Candidate keys
             Fifth Normal Form (5NF)
2/11/2013                              Database Design                                                 32
Denormalization
•       Denormalization if a process in which we retain or
        introduce some amount of redundancy for faster
        data access
•       Where there arise tradeoffs




    2/11/2013               Database Design                  33
Summary
•       Normalization helps to reduce redundancy and few
        anomalies
•       The first 3 (1, 2 and 3) normal forms are practical
        but BCNF, 4NF and 5 NF are more of theoretical
        interests
•       Denormalization is done for fast access




    2/11/2013               Database Design                   34

More Related Content

What's hot

Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data miningzafrii
 
Strassen's matrix multiplication
Strassen's matrix multiplicationStrassen's matrix multiplication
Strassen's matrix multiplicationMegha V
 
Fractional knapsack class 13
Fractional knapsack class 13Fractional knapsack class 13
Fractional knapsack class 13Kumar
 
Database Systems - Relational Data Model (Chapter 2)
Database Systems - Relational Data Model (Chapter 2)Database Systems - Relational Data Model (Chapter 2)
Database Systems - Relational Data Model (Chapter 2)Vidyasagar Mundroy
 
Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureAnkita Mahajan
 
XML - Data Modeling
XML - Data ModelingXML - Data Modeling
XML - Data ModelingJoel Briza
 
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NF
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NFNormalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NF
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NFBiplap Bhattarai
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical ClocksDilum Bandara
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemAAKANKSHA JAIN
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemMadhu Bala
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtrackingmandlapure
 
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...Niloy Biswas
 
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxIOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxMeghaShree665225
 
0/1 knapsack
0/1 knapsack0/1 knapsack
0/1 knapsackAmin Omi
 

What's hot (20)

Temporal database
Temporal databaseTemporal database
Temporal database
 
Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
Strassen's matrix multiplication
Strassen's matrix multiplicationStrassen's matrix multiplication
Strassen's matrix multiplication
 
Fractional knapsack class 13
Fractional knapsack class 13Fractional knapsack class 13
Fractional knapsack class 13
 
Cloud Service Models
Cloud Service ModelsCloud Service Models
Cloud Service Models
 
Database Systems - Relational Data Model (Chapter 2)
Database Systems - Relational Data Model (Chapter 2)Database Systems - Relational Data Model (Chapter 2)
Database Systems - Relational Data Model (Chapter 2)
 
Csc341 – Lecture 1 network management
Csc341 – Lecture 1 network managementCsc341 – Lecture 1 network management
Csc341 – Lecture 1 network management
 
Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network Architecture
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
 
XML - Data Modeling
XML - Data ModelingXML - Data Modeling
XML - Data Modeling
 
TCP IP Addressing
TCP IP AddressingTCP IP Addressing
TCP IP Addressing
 
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NF
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NFNormalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NF
Normalization | (1NF) |(2NF) (3NF)|BCNF| 4NF |5NF
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack Problem
 
Less08 users
Less08 usersLess08 users
Less08 users
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
 
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...
UML Diagram - Use-Case diagram, Activity Diagram, Sequence Diagram, Er Diagra...
 
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptxIOT DATA MANAGEMENT AND COMPUTE STACK.pptx
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
 
0/1 knapsack
0/1 knapsack0/1 knapsack
0/1 knapsack
 

Viewers also liked

7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependenciesKumar
 
PHP mysql Aggregate functions
PHP mysql Aggregate functionsPHP mysql Aggregate functions
PHP mysql Aggregate functionsMudasir Syed
 
New Perspectives: Access.03
New Perspectives: Access.03New Perspectives: Access.03
New Perspectives: Access.03Anna Stirling
 
Aggregate Function - Database
Aggregate Function - DatabaseAggregate Function - Database
Aggregate Function - DatabaseShahadat153031
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemVarun Arora
 
functional dependencies with example
functional dependencies with examplefunctional dependencies with example
functional dependencies with exampleSiddhi Viradiya
 

Viewers also liked (8)

7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies
 
PHP mysql Aggregate functions
PHP mysql Aggregate functionsPHP mysql Aggregate functions
PHP mysql Aggregate functions
 
Unit05 dbms
Unit05 dbmsUnit05 dbms
Unit05 dbms
 
New Perspectives: Access.03
New Perspectives: Access.03New Perspectives: Access.03
New Perspectives: Access.03
 
Aggregate Function - Database
Aggregate Function - DatabaseAggregate Function - Database
Aggregate Function - Database
 
Normalization
NormalizationNormalization
Normalization
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
functional dependencies with example
functional dependencies with examplefunctional dependencies with example
functional dependencies with example
 

Similar to Dbms4

Similar to Dbms4 (7)

On unifying query languages for RDF streams
On unifying query languages for RDF streamsOn unifying query languages for RDF streams
On unifying query languages for RDF streams
 
Tap Lenh Ho 8051
Tap Lenh Ho 8051Tap Lenh Ho 8051
Tap Lenh Ho 8051
 
At c51ism
At c51ismAt c51ism
At c51ism
 
Tap lenh ho_8051 (1)
Tap lenh ho_8051 (1)Tap lenh ho_8051 (1)
Tap lenh ho_8051 (1)
 
Tap lenh ho 8051
Tap lenh ho 8051Tap lenh ho 8051
Tap lenh ho 8051
 
Microcontroller Instruction Set atmel
Microcontroller Instruction Set atmelMicrocontroller Instruction Set atmel
Microcontroller Instruction Set atmel
 
Algebra
AlgebraAlgebra
Algebra
 

Dbms4

  • 1. CS 222 Database Management System Spring 2010-11 Lecture 5 Database Design (Decomposition) Korra Sathya Babu Department of Computer Science NIT Rourkela
  • 2. Recap • Design of DB is needed to reduce redundancy and anomalies • The theory of Functional Dependency is completely studied • Better Design requires schema refinement • A solution for schema refinement is Synthesis of relations 2/11/2013 Database Design 2
  • 3. Relation Decomposition R1 R-X + X X +- X R2 R 2/11/2013 Database Design 3
  • 4. Relation Decomposition • Reason for Decomposition • A solution for reducing redundancy and Anomalies • Rules for synthesis • Lossless Join (Information Preservation) • Dependency Preservation (a special case of information preservation) • Decomposition (synthesis) types • By functional dependency • By multi-valued dependency • By Join dependency 2/11/2013 Database Design 4
  • 5. Lossless Join • Definition A decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, ( R1(r), ..., Rm(r)) = r where is the natural join of all the relations in D • The word loss in lossless refers to loss of information, not to loss of tuples. 2/11/2013 Database Design 5
  • 6. Test for Lossless Join Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of Functional Dependencies Lossless Join Test Algorithm: Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R. Step 2: Set S(i, j) := bij for all matrix entries Step 3: For each row i representing relation schema Ri Do {for each column j representing Aj do {if relation Ri includes attribute Aj then set S(i, j) := aj;} Step 4: Repeat the following loop until a complete loop execution results in no changes to S. 2/11/2013 Database Design 6
  • 7. Test for Lossless Join Lossless Join Test Algorithm: continues… Step 4: Repeat the following loop until a complete loop execution results in no changes to S. If {for each function dependency X Y in F do for all rows in S which have the same symbols in the columns corresponding to attributes in X do {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: if any of the rows has an “a” symbol for the column, set the other rows to the same “a” symbol in the column. If no “a” symbol exists for the attribute in any of the rows, choose one of the “b” symbols that appear in one of the rows for the attribute and set the other rows to that same “b” symbol in the column;}} Step 5: If a row is made up entirely of “a” symbols, then the decomposition has the lossless join property; otherwise it does not. 2/11/2013 Database Design 7
  • 8. Example 1 Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 SSN ENAME PNUM PNAME PLOCATION R3 SSN PNUM hours 2/11/2013 Database Design 8
  • 9. Example 1 A1 A2 A3 A4 A5 A6 SSN ENAME PNUM PNAME PLOCATION hours R1 b11 b12 b13 b14 b15 b16 R2 b21 b22 b23 b24 b25 b26 R3 b31 b32 b33 b34 b35 b36 R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 b32 a3 b34 b35 a6 2/11/2013 Database Design 9
  • 10. Example 1 SSN ENAME SSN ENAME R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 a2 a3 b34 b35 a6 PNUM {PNAME, PLOCATION} PNUM PNAME PLOCATION R1 a1 a2 b13 b14 b15 b16 R2 b21 b22 a3 a4 a5 b26 R3 a1 a2 a3 a4 a5 a6 2/11/2013 Database Design 10
  • 11. Example 2 Emp_PROJ SSN PNUM hours ENAME PNAME PLOCATION F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours} R1 R2 ENAME PLOCATION SSN PNUM hours PNAME PLOCATION 2/11/2013 Database Design 11
  • 12. Example 2 A1 A2 A3 A4 A5 A6 SSN ENAME PNUM PNAME PLOCATION hours R1 b11 b12 b13 b14 b15 b16 R2 b21 b22 b23 b24 b25 b26 R1 b11 a2 b13 b14 a5 b16 R2 a1 b22 a3 a4 a5 a6 SSN ENAME PNUM {PNAME, PLOCATION} {SSN, PNUM} hours 2/11/2013 Database Design 12
  • 13. Problems • Check whether the following decompositions are lossy or lossless • Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE. Let F={AC, BC, CD, DEC, CEA} • R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}. R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ) • R(XYZ), F={XY, ZY}. R1(XY), R2(YZ) • R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)} F={XYW, XWP, PQZ, XYQ} 2/11/2013 Database Design 13
  • 14. Dependency Preservation R was decomposed (normalisation) into R1, …, Rn S - the set of FDs for R S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) S’ = S1 … Sn (usually, S’ S) the decomposition is dependency preserving if S’+ = S+ 2/11/2013 Database Design 14
  • 15. Test for Dependency Preservation Input: decomposition D={D1,…,Dk} and a set of FDs F Dependency Preservation Test: Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2 Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4 Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply the corresponding Ri operation (on a set of attributes T with respect to set of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3 Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4 2/11/2013 Database Design 15
  • 16. Problems 1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F. 2. Given R(ABCD) and the set F = {AB , CD}. Check if the decomposition R1(AB) and R2(CD) preserve the set F. 3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={XY, YZ, ZW, WX}. 4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F. 2/11/2013 Database Design 16
  • 17. Normalization • Normalization is the process of successive reduction of a given set of relations to a better form (reduced redundancy and anomalies) • The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity) • Assumes that all possible functional dependencies are known • First construct a minimal set of FDs • Then apply algorithms that construct a required Normal Form • Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory 2/11/2013 Database Design 17
  • 18. 1 NF • A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns • It is the process of converting complex data structures into more simple, stable data structures • A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute • First Normal From (1NF) • Unique rows • All attributes are atomic 2/11/2013 Database Design 18
  • 19. 2 NF • A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key • The following relation is in 1NF but not 2NF EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed) Functional dependencies: 1. Emp_ID  Name, Dept, Salary partial key dependency 2. Emp_ID, Course  Date_Completed Decompose into 2NF EMPLOYEE1(Emp_ID, Name, Dept, Salary) Functional dependencies: Emp_ID Name, Dept, Salary EMPCOURSE(Emp_ID, Course,Date_Completed) Functional dependency: Emp_ID, Course  Date_Completed 2/11/2013 Database Design 19
  • 20. 3 NF • A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key SALES(Customer_ID, Customer_Name, SalesPerson, Region) Functional dependencies: 1. Customer_ID  Customer_Name, SalesPerson, Region 2. SalesPerson  Region Transitive Dependency Decompose into 3NF SALES1(Customer_ID, Customer_Name, SalesPerson) Functional dependencies: Customer_ID Customer_Name, SalesPerson SPERSON(SalesPerson, Region) Functional dependency: SalesPerson  Region 2/11/2013 Database Design 20
  • 21. BCNF • A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table • A table is in BCNF if the only determinants in the table are the candidate keys SCHOOL(Student, Subject, Teacher) Functional dependencies: 1. Student, Subject  Teacher 2. Student, Teacher  Subject 3. Teacher  Subject Decompose into BCNF SCHOOL1(Student, Subject) SCHOOL2(Subject, Teacher) All Functional Dependencies vanished except TeacherSubject 2/11/2013 Database Design 21
  • 22. Comparison between 3NF and BCNF • It is always possible to decompose a relation into relations in 3NF such that:  the decomposition is lossless  the dependencies are preserved • It is always possible to decompose a relation into relations in BCNF such that:  the decomposition is lossless  but it may not be possible to preserve dependencies  But may eliminate more redundancy 2/11/2013 Database Design 22
  • 23. Multivalued Dependency Let R be a relation schema and let R and R. The multivalued dependency holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4 in r such that: t1[ ] = t2 [ ] = t3 [ ] = t4 [ ] t3[ ] = t1 [ ] t3[R – ] = t2[R – ] t4 ] = t2[ ] t4[R – ] = t1[R – ] • MVD is a tuple generating Dependency 2/11/2013 Database Design 23
  • 24. 4 NF • A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multi- valued parts of the primary key • If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B • The normal forms after BCNF are theoretical interests 2/11/2013 Database Design 24
  • 25. 4 NF Student Table Student Subject Language Geeta Mythology English Geeta Psychology English Geeta Mythology Hindi Geeta Psychology Hindi Shekher Gardening English Student Subject Student Language 2/11/2013 Database Design 25
  • 26. 4 NF Split the independent multi-valued components of the primary key into two tables The primary key is (student subject language) Student_Subject Table Student_Language Table Student Subject Student Language Geeta Mythology Geeta English Geeta Psychology Geeta Hindi Shekher Gardening Shekher English Here we take care of the update anomaly 2/11/2013 Database Design 26
  • 27. Surprise: Loss less Decomposition • There exists relations that cannot be nonloss- decomposed into two projects, but can be decomposed into three or more 2/11/2013 Database Design 27
  • 28. Join Dependency • Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z) iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R. • Consider the following Suppliers(S), Parts(P) and Location they Supply (L) table SPL Table S P L S P P L S1 P1 L2 ACTUAL S1 P1 P1 L2 DECOMPOSTION S1 P2 L1 S1 P2 P2 L1 S2 P1 L1 S2 P1 P1 L1 S1 P1 L1 2/11/2013 Database Design 28
  • 29. Join Dependency S P L S P P L S1 P1 L2 ACTUAL S1 P1 P1 L2 DECOMPOSTION S1 P2 L1 S1 P2 P2 L1 S2 P1 L1 S2 P1 P1 L1 S1 P1 L1 Join S P L S1 P1 L2 S1 P2 L1 S2 P1 L1 S1 P1 L1 Spurious S2 P1 L2 Tuple 2/11/2013 Database Design 29
  • 30. Join Dependency S P L S P P L L S S1 P1 L2 DECOMPOSTION S1 P1 P1 L2 L2 S1 S1 P2 L1 S1 P2 P2 L1 L1 S1 S2 P1 L1 S2 P1 P1 L1 L2 S2 S1 P1 L1 Join S P L S1 P1 L2 S1 P2 L1 S2 P1 L1 S1 P1 L1 2/11/2013 Database Design 30
  • 31. 5 NF • A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key • Its also called as the Project Join Normal Form (PJNF) 2/11/2013 Database Design 31
  • 32. Normalization Un-normalized Relation Arrange every atomic value in the cell (intersection of row and column) of a table First Normal Form (1NF) Eliminate Partial Dependencies Second Normal Form (2NF) Eliminate Transitive Dependencies Third Normal Form (3NF) Make every determinant as a key Boyce-Codd Normal Form Eliminate Multi-valued Dependencies that are not Functional Dependencies Fourth Normal Form (4NF) Eliminate Join Dependencies that are not implied by Candidate keys Fifth Normal Form (5NF) 2/11/2013 Database Design 32
  • 33. Denormalization • Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access • Where there arise tradeoffs 2/11/2013 Database Design 33
  • 34. Summary • Normalization helps to reduce redundancy and few anomalies • The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests • Denormalization is done for fast access 2/11/2013 Database Design 34