1.
Relational database design Normalization Prepared by Vaishali Kalaria
2.
Design Guidelines for Relational Databases What is relational database design? The grouping of attributes to form "good" relation schemas Two levels of relation schemas The logical "user view" level The storage "base relation" level Design is concerned mainly with base relations What are the criteria for "good" base relations?
3.
1. Semantics of the Relation Attributes each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes). Attributes of different entities should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.
4.
2. Redundancy and Data Anomalies Redundant data is where we have stored the same „information‟ more than once. i.e., the redundant data could be removed without the loss of information. Wastes storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies Design a schema that does not suffer from the insertion, deletion and update anomalies.
5.
Example: the following relation that contains staff and department details:staffNo job dept dname city Such ‘redundancy’ SL10 Salesman 10 Sales Stratford could lead to the following SA51 Manager 20 Accounts Barking ‘anomalies’ DS40 Clerk 20 Accounts Barking OS45 Clerk 30 Operations Barking
6.
• Insert Anomaly: Need to store a value for an attribute but cannot because the value for another attribute is unknown. • We can‟t insert a dept without inserting a member of staff that works in that department Update Anomaly: Occurs when a change of a single attribute in one record requires changes in multiple records • We could change the name of the dept that SA51 works in without simultaneously changing the dept that DS40 works in. Deletion Anomaly: Occurs when the removal of a record results in a loss of important information about an entity. • By removing employee SL10 we have removed all information pertaining to the Sales dept.
7.
3 Null Values in Tuples Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls: Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable
8.
Purpose of Normalization To avoid redundancy by storing each „fact‟ within the database only once. To put data into a form that conforms to relational principles - no repeating groups. To put the data into a form that is more able to accurately accommodate change. To avoid certain updating „anomalies‟. To facilitate the enforcement of data constraints.
9.
Normalization "Normalization" refers to the process of creating an efficient, reliable, flexible, and appropriate "relational" structure for storing information. Normalized data must be in a "relational" data structure. Usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships
10.
The Process of Normalization• Normalization is often executed as a series of steps. • Each step corresponds to a specific normal form that has known properties.• As normalization proceeds, • the relations become progressively more restricted in format, and • less vulnerable to update anomalies.
11.
Stages of Normalisation Unnormalised (UDF) Remove repeating groups First normal form (1NF) Remove partial dependenciesSecond normal form (2NF) Remove transitive dependenciesThird normal form (3NF) Remove remaining functional dependency anomaliesBoyce-Codd normal form (BCNF) Remove multivalued dependenciesFourth normal form (4NF) Remove remaining anomaliesFifth normal form (5NF)
12.
Unnormalized Normal Form (UNF) Definition: A relation is unnormalized when it has not had any normalization rules applied to it, and it suffers from various anomalies. the capturing of attributes to a ‘Universal Relation’ from a screen layout, manual report, manual document, etc...
13.
ClientRental relation in UNF Repeating group = (propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName) Unnormalized form (UNF) A table that contains one or more repeating groups.ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina 1-Jul-00 31-Aug-01 350 CO40 Murphy PG4 St,Glasgow JohnCR76 kay Tony PG16 5 Novar Dr, Shaw 1-Sep-02 1-Sep-02 450 CO93 Glasgow 6 lawrence Tina PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy St,Glasgow Tony Aline 2 Manor Rd,CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw Stewart Glasgow Tony 5 Novar Dr, Shaw PG16 1-Nov-02 1-Aug-03 450 CO93 GlasgowFigure ClientRental unnormalized table
14.
First Normal Form (1NF) Definition: A relation is in 1NF if, and only if, all its underlying attributes contain atomic values only. the intersection of each row and column contains one and only one value. Remove repeating groups into a new relation 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple.
15.
1NFThere are two approaches to removing repeating groups fromunnormalized tables: 1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
16.
1NF ClientRental relation with the firstapproachThe ClientRental relation is defined as follows,ClientRental first approach, we remove the repeating group With the ( clientNo, propertyNo, cName, pAddress, rentStart,rentFinish, rent, ownerNo, oName) entering the appropriate client (property rented details) by data into each row.ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName John 6 lawrence TinaCR76 PG4 1-Jul-00 31-Aug-01 350 CO40 Kay St,Glasgow Murphy John 5 Novar Dr, TonyCR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Kay Glasgow Shaw Aline 6 lawrence TinaCR56 PG4 1-Sep-99 10-Jun-00 350 CO40 Stewart St,Glasgow Murphy Tony Aline 2 Manor Rd,CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw Stewart Glasgow Tony Aline 5 Novar Dr,CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw Stewart GlasgowFigure 1NF ClientRental relation with the first approach
17.
1NF ClientRental relation with thesecond approachClient (clientNo, cName)With the second approach, we remove the repeating groupPropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,(property rented details) by placing the repeating data along wit rentFinish, rent, ownerNo, oName)a copy of the original key attribute (clientNo) in a separte relatio ClientNo cName CR76 John Kay CR56 Aline Stewart ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina CR76 PG4 1-Jul-00 31-Aug-01 350 CO40 St,Glasgow Murphy 5 Novar Dr, Tony CR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Glasgow Shaw 6 lawrence Tina CR56 PG4 1-Sep-99 10-Jun-00 350 CO40 St,Glasgow Murphy 2 Manor Rd, Tony CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Glasgow Shaw 5 Novar Dr, Tony CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Glasgow Shaw Figure 1NF ClientRental relation with the second approach
19.
Second Normal Form (2NF) A database table is said to be in 2NF if it is in 1NF and contains only those fields/columns that are functionally dependent on the primary key. In 2NF the partial dependencies can be removed of any non-key field. Note: It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.
20.
The process of converting the databasetable into 2NF: Identify the primary key for the 1NF relation. Identify the functional dependencies in the relation. If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.
22.
2NF ClientRental relationAfter removing the partial dependencies, the creation of the three Client (clientNo, cName)new relations called Client, Rental, andrentStart, rentFinish) Rental (clientNo, propertyNo, PropertyOwnerPropertyOwner (propertyNo, pAddress, rent, ownerNo, oNameClient Rental ClientNo cName ClientNo propertyNo rentStart rentFinish CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01 CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02 CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03PropertyOwner propertyNo pAddress rent ownerNo oName PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw Figure 2NF ClientRental relation
23.
Third Normal Form (3NF)Transitive dependencyA condition where A, B, and C are attributes of a relation such thif A B and B C, then C is transitively dependent on A via B(provided that A is not functionally dependent on B or C).
24.
Third normal form (3NF) A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key. The normalization of 2NF relations to 3NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant.
28.
Boyce-Codd Normal Form (BCNF) A relation is in BCNF if, and only if, every determinant is a candidate key. BCNF is a refinement to third normal form, A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R That is every relation in BCNF is also in 3NF but a relation in 3NF is not necessary in BCNF.
29.
3NF to BCNF Identify all candidate keys in the relation. Identify all functional dependencies in the relation. If functional dependencies exists in the relation where their determinants are not candidate keys for the relation, remove the functional dependencies by placing them in a new relation along with a copy of their determinant.
38.
What is Decomposition? Decomposition – the process of breaking down in parts or elements. Decomposition in database means breaking tables down into multiple tables From Database perspective means going to a higher normal form To break the modules to in smallest one to convert the data models in to a normal forms to avoid redundancies
39.
Decomposition of relation schema Suppose R is a relation schema R = {A1,A2,A3,….An} This is decompose into a set of relational schemas by D = {R1,R2,R3,…Rm } ,such that Ri ⊆ R for 1<= i <=m And R1 ⋃ R2 ⋃ R3….⋃ Rm = R Ex: gradeInfo(rollNo, studName, course, grade) R1 : gradeInfo(rollNo, course, grade) R2 : studInfo(rollNo, studName)
41.
DecompositionImportant that decompositions are “good”,Two Characteristics of Good Decompositions 1) Lossless 2) Preserve dependencies
42.
Problem with Decomposition Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation – information loss
43.
Example : Problem with Decomposition R Model Name Price Category a11 100 Canon s20 200 Nikon a70 150 CanonR1 R2 Model Name Category Price Category a11 Canon 100 Canon s20 Nikon 200 Nikon a70 Canon 150 Canon
44.
Example : Problem with Decomposition R1 U R2 Model Name Price Category a11 100 Canon a11 150 Canon s20 200 Nikon a70 100 Canon a70 150 Canon Model Name Price Category R a11 100 Canon s20 200 Nikon a70 150 Canon
45.
Lossy decomposition In previous example, additional tuples are obtained along with original tuples Although there are more tuples, this leads to less information Due to the loss of information, decomposition for previous example is called lossy decomposition or lossy-join decomposition
46.
Lossy decomposition (more example)T Employee Project Branch Brown Mars L.A. Green Jupiter San Jose Green Venus San Jose Hoskins Saturn San Jose Hoskins Venus San Jose Functional dependencies: Employee Branch, Project Branch
47.
Lossy decomposition Decomposition of the previous relationT1 T2 Employee Branch Project Branch Mars L.A.Brown L.A Jupiter San JoseGreen San Jose Saturn San JoseHoskins San Jose Venus San Jose
48.
Lossy decomposition After Natural Join Original RelationEmployee Project Branch Employee Project BranchBrown Mars L.A. Brown Mars L.A.Green Jupiter San Jose Green Jupiter San JoseGreen Venus San Jose Green Venus San JoseHoskins Saturn San Jose Hoskins Saturn San JoseHoskins Venus San Jose Hoskins Venus San JoseGreen Saturn San JoseHoskins Jupiter San Jose After Natural Join, we get two extra tuples. Thus, there is loss of information
49.
What is lossless? Lossless means functioning without a loss. In other words, retain everything. Important for databases to have this feature.
50.
Lossless Decomposition PropertyR : relationF : set of functional dependencies on RX,Y : decomposition of RDecomposition is lossles if : X ∩ Y X, that is: all attributes common to both X and Y functionally determine ALL the attributes in X OR X ∩ Y Y, that is: all attributes common to both X and Y functionally determine ALL the attributes in Y In other words, if X ∩ Y forms a superkey of either X or Y, the decomposition of R is a lossless decomposition
51.
Why lossless? Ensures that attributes involved in the natural join(X ∩ Y) are a candidate key for at least one of the tworelations.This ensures we can never get the situation wherefalse tuples are generated, as for any value on the join attributes there willbe a unique tuple in one of the relations.
52.
Lossless Decomposition A decomposition is lossless if we can recover: R(A,B,C) Decompose R1(A,B) R2(A,C) Recover R‟(A,B,C) should be the same as R(A,B,C) Must ensure R‟ = R
53.
Lossless Decomposition example• Sometimes the same set of data is reproduced: Name Price Category Word 100 WP Oracle 1000 DB Access 100 DB Name Price Name Category Word 100 Word WP Oracle 1000 Oracle DB Access 100 Access DB• (Word, 100) + (Word, WP) (Word, 100, WP)• (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB)• (Access, 100) + (Access, DB) (Access, 100, DB)
54.
Lossy Decomposition• Sometimes it‟s not: Name Price Category Word 100 WP What’s Oracle 1000 DB wrong? Access 100 DB Category Name Category Price WP Word WP 100 DB Oracle DB 1000 DB Access DB 100• (Word, WP) + (100, WP) = (Word, 100, WP)• (Oracle, DB) + (1000, DB) = (Oracle, 1000, DB)• (Oracle, DB) + (100, DB) = (Oracle, 100, DB)• (Access, DB) + (1000, DB) = (Access, 1000, DB)• (Access, DB) + (100, DB) = (Access, 100, DB)
55.
Ensuring lossless decomposition R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An B1, ..., Bm or A1, ..., An C1, ..., Cp Then the decomposition is lossless Note: don‟t need both
56.
Dependency preservation Dependency preservation refers to a specific case of lossless decomposition, such that the normalized relvars are independent of each other Some lossless decompositions do not exhibit dependency preservation Let relation R(A,B,C,D) that has dependencies F that include A ➙ B and A ➙ C decomposition: R1(A,B), R2(B,C,D) A ➙ C can not be preserved using only one relation.
57.
Not possible to preserve each and every dependency in F But dependency that are preserved are equivalent to F F dependency of Relation R R decompose in R1,R2,….Rn Dependency partition of F are F1,F2,….,Fn only involve attributes of R1,R2,..,Rn respectively then Decomposition have Preserved Dependencies F1⋃ F2 ⋃ .. ⋃ Fn ➙ F If decomposition does not preserve the dependency than decomposed relation do not satisfy the F or
58.
Dependency Preserving Decompositions (Contd.) Decomposition of R into X and Y is dependency preserving if (FX FY ) + = F + i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Important to consider F + in this definition: ABC, A B, B C, C A, decomposed into AB and BC. Is this dependency preserving? Is C A preserved????? note: F + contains F {A C, B A, C B}, so… FAB contains A B and B A; FBC contains B C and C B +
59.
Dependency Preservation Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other Dependency is preserved in this projection: SC {S#, CITY} CS {CITY, STATUS} Dependency is not preserved in this one: SC {S#, CITY} CS {S#, STATUS} Although the second is nonloss, you still cannot update
60.
Dependency PreservationEnsures we can “easily” check whether a FD X Y is violated during an update to a database: The projection of an FD set F onto a set of attributes Z, FZ is {X Y|X Y F +, X Y Z} i.e., it is those FDs local to Z‟s attributes A decomposition R1, …, Rk is dependency preserving if F + = (FR1 ... FRk)+ The decomposition hasn‟t “lost” any essential FD‟s, so we can check without doing a join
61.
Example of Lossless andDependency-Preserving DecompositionsGiven relation scheme R(cno, name, street, city, st, zip, item, price) And FD set cno name name street, city street, city st street, city zip name, item priceConsider the decomposition R1(cno, name, street, city, st, zip) and R2(cno, name, item, price) Is it lossless? Is it dependency preserving? What if we replaced the first FD by name, street city?
62.
Comparison of BCNF and 3NF It is always possible to decompose a relation into a set of relations that are in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into a set of relations that are in BCNF such that: the decomposition is lossless it may not be possible to preserve dependencies.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment