Relational Database Design - Lecture 4 - Introduction to Databases (1007156ANR)

  • 8,056 views
Uploaded on

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
8,056
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
581
Comments
0
Likes
10

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Databases Relational Database Design Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com 2 December 2005
  • 2. Relational Database Design  There are two major relational database design approaches  Top-down design    develop a conceptual model (e.g. ER model) reduction (mapping) of the conceptual model to relation schemas use normalisation as a validation technique to check the quality of the resulting relation schemas - a relational database schema resulting from the mapping of a good ER model (with the correct entity sets) normally requires no further normalisation  Bottom-up design   March 7, 2014 design by decomposition use normalisation to iteratively create (decompose) a set of relations starting with a single relation Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2
  • 3. Relational Database Design ...  A relation schema might contain certain dependencies in which case it should be decomposed (normalised) into multiple smaller relation schemas  this normalisation process is based on functional dependencies and multivalued dependencies  Sometimes multiple relations resulting from an ER to relation schema reduction might be merged to save some join query operations  March 7, 2014 we have to ensure that the resulting larger relation schema does not introduce new undesirable dependencies Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3
  • 4. Reduction  A conceptual ER model can be reduced to a set of relation schemas (relational database schema)  The quality of the resulting set of relation schemas depends on the quality of the original ER design  In the following we discuss the reduction of the different ER model concepts introduced earlier March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4
  • 5. Strong Entity Sets  A strong entity set E with only simple attributes a1,..., an is mapped to a relation R with attributes a1,..., an  the primary key of the entity set E becomes the primary key of the relation R Employee (id, name) relation schema employee = (Employee) Employees id Lode Hoste Sandra Trullemans ... March 7, 2014 Beat Signer 3212 name 1234 1576 id name ... Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5
  • 6. Composite Attributes  For each component of a composite attribute, we create an attribute ai in the relation R  no special attribute is created for the composite attribute itself Employees id name address street city Employee (id, name, street, city) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6
  • 7. Multivalued Attributes  Multivalued attributes are treated separately since a relation should only contain attributes with atomic values  for each multivalued attribute ai of an entity set E, we create a new relation S containing the attribute ai as well as the primary key attributes of the relation R that is created for the entity set E - define a foreign key constraint to the original relation R Phones (id, phone) Employees id name phone phones = (Phones) id 1234 032 2 612 1337 1234 032 2 612 3123 1576 032 2 623 8765 ... March 7, 2014 phone ... Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7
  • 8. Weak Entity Sets  A weak entity set E with attributes a1,..., an is mapped to a relation R with attributes a1,..., an combined with the primary key attributes b1,..., bm of the identifying entity set F   March 7, 2014 the primary key of R is defined by the primary key attributes of the identifying entity set F combined with the discriminator of E a foreign key constraint is defined from the attributes b1,..., bm to the primary key of the relation that is created for the identifying entity set F Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8
  • 9. Weak Entity Sets ... Cinemas id Seats Offers name number colour Seat (id, number, colour) seat = (Seat) id colour 1 1 red 1 20 black 4 1 black ... March 7, 2014 number ... ... Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9
  • 10. Relationship Sets  A relationship set over the entity sets E1,..., En with the optional descriptive attributes b1,..., bm is mapped to a relation R with the primary key attributes of E1,..., En combined with b1,..., bm  The primary key of relation R is defined as follows  binary many-to-many relationship - union of all primary key attributes of E1 and E2  binary one-to-one relationship - choose the primary key of E1 or E2  binary one-to-many or many-to-one relationship - choose the primary key of the entity set on the "many" side March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10
  • 11. Relationship Sets ...  The primary key of relation R is defined as follows ...  n-ary relationship without cardinality constraints - union of all primary key attributes of E1,..., En  n-ary relationship with one 0..1 or 1..1 cardinality constraint over the entity set Ej - union of all primary key attributes of E1,..., En , except the primary key of Ej - note that we allow only one such 0..1 or 1..1 cardinality constraint for n-ary relationships  A foreign key constraint is defined for each set of primary key attributes (provided by the entity set Ei) to the primary key of the corresponding relation that is defined for Ei March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11
  • 12. Relationship Sets ... duration Employees id Offices LocatedAt name name size address LocatedAt (id, name, address, duration) locatedAt = (LocatedAt) id address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... March 7, 2014 name ... ... ... Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12
  • 13. Relationship Sets ... duration Employees id 1..1 name LocatedAt 0..* Offices name size address LocatedAt (id, name, address, duration) locatedAt = (LocatedAt) id address duration 1234 10F721 Pleinlaan 2 1 1576 10F733 Pleinlaan 2 1 ... March 7, 2014 name ... ... ... Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13
  • 14. Weak Entity Existence Relationship  The special relationship set from a weak entity set to its defining entity set is always a many-to-one relationship  the special weak entity existence relationship does not have to be mapped to a separate relation since it is already covered by the relation that is created for the weak entity set - e.g. potential Offers relation schema already covered by Seat relation schema Cinemas id Seats Offers name number colour Seat (id, number, colour) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14
  • 15. Combination of Schemas  Relations resulting from the mapping of a relationship set with a total participation constraint can be integrated with the relation over which the constraint is defined   key of the relation with the constraint (1..1) used as primary key also works for partial relationships (have to use null values) duration Employees id 1..1 LocatedAt name 0..* name Offices size address Employee (id, employeeName, duration, name, address) Office (name, address, size) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15
  • 16. Specialisation and Generalisation  Create a new relation R for each entity subset  combine the attributes of the entity set with the primary key attributes of the superclass id Persons name ISA studentID Students Teachers teaching hours Person (id, name) Student (id, studentID) Teacher (id, teachingHours) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16
  • 17. Specialisation and Generalisation ...  For a disjoint and total ISA constraint we might omit the separate superclass relation  saves some join operations but it is no longer possible to define a foreign key constraint on the id attribute (now at two places) id Persons ISA studentID Students name disjoint Teachers teaching hours Student (id, name, studentID) Teacher (id, name, teachingHours) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17
  • 18. Aggregations  Like the regular from relationship set mapping Durations Employees id to Companies WorksFor name name address Manages  March 7, 2014 note that the name attribute is the one from the Companies entity set mId Managers name Manages (id, from, to, name, address, mId) Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18
  • 19. Relational Database Design  The goal of relational database design is to create a set of relation schemas that   can be used to store information without unnecessary redundancy allow us to easily retrieve information  The quality of the set of schemas resulting from a reduction (top-down design) depends on how good the original ER design was  In a design by decomposition approach (bottom-up design) we need a way to reduce any redundancy via a decomposition process  March 7, 2014 split large relations into multiple smaller relations Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19
  • 20. Update Anomalies  Insertion anomaly  redundant information has to be kept consistent - e.g. insertion of a new order for an already existing CD  information about a CD can only be inserted if there is an order or we have to populate the customer information (i.e. name and street) with null values Order (id, name, street, cdName, price) order = (Order) id street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 March 7, 2014 name Max Frisch Bahnhofstrasse 7 Carcassonne 15.50 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20
  • 21. Update Anomalies ...  Modification anomaly  if we want to modify information about a particular CD, we have to ensure that the information is updated in all redudant entries - e.g. modification of the price of the CD named "Falling into Place"  Deletion anomaly  if we delete a customer who is the only buyer of a specific CD, we also lose the information about that specific CD - e.g. deletion of the customer "Albert Einstein" id street cdName price 1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90 2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90 53 Albert Einstein Bergstrasse 18 Chromatic 16.50 5 March 7, 2014 name Max Frisch Bahnhofstrasse 7 Carcassonne 15.50 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21
  • 22. Normalisation  Normalisation is a formal method to analyse relation schemas based on their keys, functional dependencies (FD) as well as multivalued dependencies (MVD)  remove redundancy prevent certain update anomalies - insertion, modification and deletion  There exists a set of rules to check if a relation is in a specific normal form Fifth Normal Form (5NF) Fourth Normal Form (4NF) stronger  Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) original normal forms described by Codd Second Normal Form (2NF) First Normal Form (1NF) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22
  • 23. Normalisation ...  A relation that does not conform to a certain degree of normalisation can be decomposed (lossless-join decomposition) into multiple relations that are in the desired normal form  can be done automatically  Normalisation is often done in a stepwise manner   March 7, 2014 a higher normal form means a more restricted format and less problems with update anomalies note that only the first normal form (1NF) is mandatory for the relational model and all the other normal forms are optional Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23
  • 24. First Normal Form (1NF)  As we have seen earlier, the ER model supports complex attributes   composite attributes multivalued attributes  In the reduction process, we remove this substructure from attributes to create a relational model with atomic attribute values only  A relation schema R is in first normal form (1NF) if the domains D1,..., Dn of all attributes a1,..., an of R are atomic   March 7, 2014 no composite attributes or attributes with a set of values the intersection of each row and column contains one and only one value Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24
  • 25. Functional Dependencies TeacherDept (teacherID, teacher, salary, department, building, budget)  In this example, there are various sets of attributes that uniquely identify a set of other attributes       teacherID  teacher teacherID  salary teacherID  {teacher, salary} {teacherID, teacher}  {salary} department  {building, budget} ...  We say that there is a functional dependency () between these two sets of attributes  March 7, 2014 a functional dependency should always hold on a relation schema and not just on a particular relation instance Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25
  • 26. Functional Dependencies ...  A functional dependency can be used to express constraints (generalisation of keys) over a set of attributes (determinant) that uniquely identify a set of other attributes (dependent attributes)  For a relation schema R with a  R and b  R the functional dependency a  b holds on R, if for any r(R)  " t1,t2  r(R) with t1[a] = t2[a]  t1[b] = t2[b]  Note that any K  R is a superkey if K  R  March 7, 2014 we can use functional dependencies to check whether K is a superkey Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26
  • 27. Functional Dependencies ...  The relation r(R) contains the follow  B C D E a1 b1 c1 d1 e1 b2 c2 d1 e2 a2 b2 c3 d1 e3 a3 AB CE ... A a2 ing set F of functional dependencies  r(R) b2 c4 d3 e3  A functional dependency a  b is trivial if b  a  trivial dependencies are satisfied by all relations  A full functional dependency has a minimal determinant  if the determinant is not minimal, we talk about a partial functional dependency (e.g. AD  B in the example)  For a relation r(R) with a  b and b   we say that  is transitively dependent on a via b March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27
  • 28. Closure of Attributes  For a given relation schema R, a number of functional dependencies and a set of attributes a  R, the closure a+ is defined by all attributes Bi such that a  Bi  Computing the closure Initialise the set s with the attributes of a Repeat until the set s does not grow anymore { if there is a functional dependency b   and b is in s, then add  to the set s }  If the closure a+ contains all attributes of the relation schema R, then the attributes a form a superkey of R March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28
  • 29. Computation of Candidate Keys  We can test whether a is a candidate key for a given relation schema R by checking whether the closure a+ contains all attributes of R  We can further use this approach to find all the candidate keys for a relation schema R and a given set of functional dependencies   March 7, 2014 check for each set a  R of attributes whether the closure a+ contains all attributes the search process can be slightly optimised by starting with the smallest possible subsets Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29
  • 30. Functional Dependency Inference  For a given set F of functional dependencies we can derrive new functional dependencies based on a set of axioms to compute the closure F+ of F  the closure F+ includes all functional dependencies that are logically implied by F  Three rules (Armstrong's axioms) can be used to compute F+  reflexivity - for a given set of attributes a and b  a, a  b holds (see trivial dependency)  augmentation - for given a set of attributes ; if a  b then a  b holds  transitivity - if a  b and b  , then a   holds March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30
  • 31. Functional Dependency Inference ...  Armstrong's axioms are sound (produce only elements of F+) and complete (produce all elements in F+)  since it may take a lot of time to compute F+ with Armstrong's axioms only, there exist some additional rules  Decomposition  if a  b, then a  b and a   hold  Union  if a  b and a  , then a  b holds  Trivial dependency rules   March 7, 2014 if a  b, then a  a  b holds if a  b, then a  a  b holds Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31
  • 32. Second Normal Form (2NF)  A relation schema R is in second normal form (2NF) if it is in 1NF and if there exists no non-prime attribute that is functionally dependent on a part of a candidate key    March 7, 2014 every non-prime attribute has to be fully functionally dependent on a candidate key Lecturer (teacher, course, office) a non-prime attribute is an attribute that is not part of any candidate key lecturer = (Lecturer) teacher course office the Lecturer relation schema shown in the Beat Signer Databases 10G731d example is not in 2NF Beat Signer WIS 10G731d since the office attribute Lode Hoste Databases 10F716 functionally depends on Lode Hoste ATIS 10F716 the teacher attribute Sandra Trullemans WIS 10G731e Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32
  • 33. Second Normal Form (2NF) ...  2NF normalisation process  remove any partially dependent attributes from the relation and put them in a new relation together with their determinant  The original Lecturer relation can be losslessly decomposed into two relations which are both in 2NF  relations with single attribute keys are automatically in 2NF course = (Course) Lecturer (teacher, office) Course (teacher, course) teacher course Beat Signer Databases lecturer = (Lecturer) Beat Signer WIS teacher office Lode Hoste Databases Beat Signer 10G731d Lode Hoste ATIS Lode Hoste 10F716 Sandra Trullemans WIS Sandra Trullemans 10G731e March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33
  • 34. Lossless Decomposition  Given a relation schema R and the two decompositions R1 and R2 of R, we say that R1 and R2 form a lossless decomposition if pR1 (r) ⋈ pR2 (r) = r  Let F be a set of functional dependencies on R  R1 and R2 form a lossless decomposition of R if either R1  R2  R1 or R1  R2  R2 are in F+ - this means that R1  R2 is a superkey of R1 or R2 March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34
  • 35. Third Normal Form (3NF)  A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute is transitively dependent on a candidate key, i.e. for all functional dependencies a  b in F+ one of the following has to hold    a  b is a trivial functional dependency (i.e. b  a) a is a superkey of R each attribute Ai in b - a is contained in a candidate key of R - note that each Ai can be in different candidate keys  Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key" [Bill Kent] March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35
  • 36. Third Normal Form (3NF) ...  The Prize relation example schema is in 2NF  The Prize relation schema is not in 3NF since birthdate is functionally dependent on winner and non of the three conditions holds for this functional dependency  birthdate is transitively dependent on the key (award, year) Prize (award, year, winner, birthdate) prize = (Prize) award winner birthdate ACM Turing Award 1981 Edgar F. Codd 23.08.1923 Nobel Peace Prize 1979 Mother Teresa 26.08.1910 ACM Turing Award 1984 Niklaus Wirth 15.02.1934 Nobel Peace Prize March 7, 2014 year 1984 Desmond Tutu 07.10.1931 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36
  • 37. Third Normal Form (3NF) ...  3NF normalisation process  remove any transitively dependent attributes from the relation and place them in a new relation together with their determinant  Decomposition of the Prize relation schema into two 3NF relation schemas Prize (award, year, winner) Birthdate (winner, birthdate) bdate = (Birthdate) prize = (Prize) award year winner winner birthdate ACM Turing Award 1981 Edgar F. Codd Edgar F. Codd 23.08.1923 Nobel Peace Prize 1992 Mother Teresa Mother Teresa 09.01.1959 ACM Turing Award 1984 Niklaus Wirth Niklaus Wirth 15.02.1934 Nobel Peace Prize 1984 Desmond Tutu Desmond Tutu 07.10.1931 March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37
  • 38. Boyce-Codd Normal Form (BCNF)  The Boyce-Codd normal form is a stronger form of 3NF  A relation schema R is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and if every determinant is a candidate key, i.e. for all functional dependencies a  b in F+ one of the following holds   a  b is a trivial functional dependency (i.e. b  a) a is a superkey of R  Any relation that is in BCNF is also in 3NF since the BCNF conditions are equivalent to the first two 3NF conditions March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38
  • 39. BCNF Decomposition  If a relation R is not in BCNF, then there exists a least one nontrivial functional dependency a  b where a is not a superkey of R  the relation R can then be decomposed into the two relation schemas R1 (a  b) and R2 (R - (b - a))  We can for example apply the BCNF decomposition to the previous Prize relation schema example with the functional dependency winner  birthdate   a  b = (winner, birthdate) (R - (b - a)) = (award, year, winner)  Further details about the algorithms for BCNF and 3NF decomposition can be found in the course book March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39
  • 40. Multivalued Dependencies  Some relation schemas that are in BCNF may still contain redundant information  The fourth normal form (4NF) deals with some of these problems based on multivalued dependencies  for a given relation schema R with a  R and b  R the multivalued dependency a ↠ b holds if for all pairs of tuples t1 and t2 in r(R) (with t1[a] = t2[a]) there exist tuples t3 and t4 in r(R) such that - t1[a] = t2[a] = t3[a] = t4[a] - t3[b] = t1[b] - t3[R - b] = t2[R - b] - t4[b] = t2[b] - t4[R - b] = t1[R - b] March 7, 2014 R - a - b a b t1 a1...ai ai+1...aj aj+1...an t2 a1...ai bi+1...bj bj+1...bn t3 a1...ai ai+1...aj bj+1...bn t4 a1...ai bi+1...bj aj+1...an Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40
  • 41. Multivalued Dependencies ...  Every functional dependency is also a multivalued dependency, e.g. if a  b then a ↠ b March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41
  • 42. Fourth Normal Form (4NF)  A relation schema R is in fourth normal fom (4NF) if it is in BCNF and if any non-trivial multivalued dependency is a dependency on a candidate key, i.e. for all multivalued dependencies a ↠ b in D+ one of the following has to hold   a ↠ b is a trivial functional dependency (i.e. b  a or b  a = R) a is a superkey of R  Note that the fourth normal form is very similar to BCNF except that we use multivalued dependencies  4NF normalisation process  March 7, 2014 remove any multivalued attributes from the relation and place them in a new relation together with their determinant Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42
  • 43. Fifth Normal Form (5NF)  There are some forms of constraints called join dependencies that generalise multivalued dependencies   March 7, 2014 leads to the project-join normal form or fifth normal form (5NF) not discussed in detail in this course Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43
  • 44. Normalisation Summary  Relations in higher normal forms are less vulnerable to update anomalies generally it is recommended that relations are at least in 3NF Fifth Normal Form (5NF) remove join dependencies Fourth Normal Form (4NF) stronger  remove multivalued dependencies Boyce-Codd Normal Form (BCNF) every determinant has to be a candidate key Third Normal Form (3NF) remove transitive dependencies Second Normal Form (2NF) remove partial dependencies First Normal Form (1NF) remove repeating groups Unnormalised (UN) March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44
  • 45. Denormalisation  Sometimes a database designer decides to store information in a redudant way to save join operations and improve the performance  may result in additional work for insert, update and delete operations  An alternative is to keep the normalised schema and introduce additional materialised views March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45
  • 46. Homework  Study the following chapter of the Database System Concepts book  chapter 7 - sections 7.6 and 7.8.6 - Reduction to Relation Schemas  chapter 8 - sections 8.1-8.9 - Relational Database Design March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46
  • 47. Exercise 4  Relational algebra  Relational database design  March 7, 2014 ER to relational model reduction Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47
  • 48. References  A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010 March 7, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 48
  • 49. Next Lecture Structured Query Language (SQL) 2 December 2005