DBMS - Normalization

46,546 views
45,915 views

Published on

Published in: Education, Technology, Business
5 Comments
106 Likes
Statistics
Notes
No Downloads
Views
Total views
46,546
On SlideShare
0
From Embeds
0
Number of Embeds
49
Actions
Shares
0
Downloads
0
Comments
5
Likes
106
Embeds 0
No embeds

No notes for slide

DBMS - Normalization

  1. 1. Normalization□ It is a technique for designing relational database tables to minimize duplication of information.□ Normalization is a practice to safeguard the database against logical and structural anomalies.□ Normalization is also termed as canonical synthesis by the experts.□ It is used to keep data consistent and check that no loss of data as well as data integrity is there.□ Its complexity may lead to higher degree of join operations which sometimes lead to the degraded throughput times.□ The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF & 6NF are in practice. Jitendra Tomar 1
  2. 2. NormalizationNormal Forms□ The Normal Forms (NF) of relational database provide with the theoretical procedures that determine a relation’s vulnerability towards the inconsistencies .□ If we Normalize the relation, it will always meets the requirements of its HNF (Highest Normal Form) and of all normal forms lower than its HNF. Also by definition, a relation fails to meet the requirements of any Normal Form higher than its HNF.□ The Normal Forms are applicable to individual relations. The entire database is said to be in Normal Form n if all of its relations are in Normal Form n.□ E.F. Codd gave the concepts of first three Normal Forms based on the concept of FDs where as the HNFs are based on degree of relationship between attributes of a relation. Jitendra Tomar 2
  3. 3. NormalizationAccording to Chris Date in "What First Normal FormReally Means“, a Relation is said as Un-NormalizedRelation if it does not follow the following set of rules: □ Theres no top-to-bottom ordering to the rows. □ Theres no left-to-right ordering to the columns. □ There are no duplicate rows. □ Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). □ All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps]. Jitendra Tomar 3
  4. 4. NormalizationUn-Normalized Relation□ The below given Relation is Un-Normalized Relation because of non-atomicity of the cell under Contact_No attribute. Customer Cust_ID Name Contact_No123 Navin Kumar 01202536145 01245847698456 Vikas Malhotra 01243265984789 Ashish Sharma 01125698745 Jitendra Tomar 4
  5. 5. NormalizationFirst Normal Form□ A Relation is said to be First Normal Form if it meets a certain minimum set of given criteria. □ Theres no top-to-bottom ordering of the rows. □ Theres no left-to-right ordering of the columns. □ There are no duplicate rows. □ Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). □ All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or timestamps].□ These criteria are basically concerned with ensuring that the table is a faithful representation of a Relation and that it is free of Repeating Groups. Jitendra Tomar 5
  6. 6. NormalizationFirst Normal Form□ The Novice Structure of a Customer Table. Customer Cust_ID Name Contact_No123 Navin Kumar 01202536145456 Vikas Malhotra 01245847698789 Ashish Sharma 01125698745□ The basic problem of Atomicity will arise if the structure of a Relation is kept as above.□ To have multiple contact numbers for a customer will give rise to the problem of atomicity and the user will have repeated groups. Jitendra Tomar 6
  7. 7. NormalizationFirst Normal Form□ Solution 1 □ The simplest way is to allow two values in each cell which will be the problematic aspect. □ This introduce the repeated groups of Domain & Values. Customer Cust_ID Name Contact_No123 Navin Kumar 01202536145 01245847698456 Vikas Malhotra 01243265984789 Ashish Sharma 01125698745 Jitendra Tomar 7
  8. 8. NormalizationFirst Normal Form□ Solution 2 Customer Cust_ID Name Contact_No1 Contact_No2 123 Navin Kumar 01202536145 456 Vikas Malhotra 01245847698 01243265984 789 Ashish Sharma 01125698745 □ The another way is to define multiple columns which makes use of Null Columns. □ This introduce the Repeated Groups across Columns. □ Contact_No1, Contact_No2 and so on share the identical domain and same meaning. Jitendra Tomar 8
  9. 9. NormalizationFirst Normal Form□ The splitting is artificial and causes logical problem like: □ Difficulty in querying the table such as "Which customers have telephone number X?" □ By mistake a customer is given identical values for Contact_No1 & Contact_No2. Creates Redundancy. □ Restriction on the number of Contact numbers per customer. It leaves DB with unrecorded information and means that DB design is imposing constraints on the business process, rather than (as should ideally be the case) vise-versa. Jitendra Tomar 9
  10. 10. NormalizationFirst Normal Form□ Solution (A design that complies with 1NF) □ Make two table, one for the customer details and other for the Contact details of the customer. □ Repeating groups of Contact details are absent in this design. Contact Customer Cust_ID Contact_No Cust_ID Name 123 01202536145 123 Navin Kumar 456 01245847698 456 Vikas Malhotra 456 01243265984 789 Ashish Sharma 789 01125698745 Jitendra Tomar 10
  11. 11. Normalization Jitendra Tomar 11
  12. 12. NormalizationFunctional Dependency (FD)□ It is a property of the information represented by the relation.□ It defines the most common encountered type of relatedness property between data items of a database.□ Usually, relatedness between attributes of single relational table are considered.□ FD concerns the dependence of the values of one attribute or set of attributes on those of another attribute or set of attributes, giving rise to constraint between two attributes or two sets of attributes. Jitendra Tomar 12
  13. 13. NormalizationFunctional Dependency□ In FDD, functional dependency is represented by rectangles representing attributes and a heavy arrow showing the dependency.□ E.g. FD: YX (Functional Dependency Diagram when X is functionally dependent on Y.) Y X□ The arrow notation ‘’ is read as ‘functionally determines.’ Jitendra Tomar 13
  14. 14. NormalizationFunctional Dependency□ In general terms, it can be stated that a set of attributes (subset) X in a relation model table T is said to be functionally dependent on a set of attributes (subset) Y in the table T if a given set of values for each attribute in Y determines a unique (only one) value for the set of attributes in X.□ The attributes in subset Y are sometimes knows as the determinant of FD: Y  X. The left hand side of the FD is sometimes called determinant whereas that of the right hand side is called the dependent. The determinant and dependent are both sets of attributes. Jitendra Tomar 14
  15. 15. NormalizationFunctional Dependency□ Example□ Let us consider a functional dependency that there is one person working on a machine each day, which is given as: FD: {Person_Id, Date_Used}  {Machine_no} Person_Id Machine_No Date_Used Jitendra Tomar 15
  16. 16. NormalizationFunctional Dependency□ Let us consider a functional dependency of relation Assign which is given as Emp_No Project Yrs Spent on Project 106519 P1 5 Emp_No 112233 P3 2 Yrs Spent 106519 P2 4 Project 111222 P1 4□ Here , if the values of Emp_No and Project are known, a unique value of Yrs_Spent could also be known. FD: {Emp_No, Project}  {Yrs Spend on Project} Jitendra Tomar 16
  17. 17. NormalizationFull Functional Dependency□ The set of attributes X will be fully functionally dependent on the set of attributes Y if the following conditions are satisfied: □ X is functionally dependent on Y and □ X is not functionally dependent on any subset of Y □ Example: {Proj_Id} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because is also dependent on {Employee ID}. Jitendra Tomar 17
  18. 18. Normalization Jitendra Tomar 18
  19. 19. NormalizationSecond Normal Form□ A relation R is said to be in second normal form (2NF) if □ it is in 1NF and □ every non-prime key attributes of R is fully functionally dependent on Primary Key i.e. no partial dependency is allowed in the relation R.□ In other words, no attributes of the relation should be functionally dependent on only one part of the concatenated primary key.□ Thus, 2NF can be violated only when a key is a composite key, consisting or more than one attribute.□ 2NF is an intermediate step towards higher normal forms, it eliminates the problems of 1NF. Jitendra Tomar 19
  20. 20. NormalizationSecond Normal FormExample 1□ Consider the below given relation PATIENT_DOCTOR. The relation is in 1NF. Patient DOB Doctor Contact Date Consult Name Name No Time Duration Ravi 10.02.1963 Abhishek 1122334 10.01.08 10:20 15 Sanjay 05.08.1983 Prakash 2255886 10.01.08 11:00 10 Ravi 10.02.1963 Manish 5566448 10.01.08 12:30 20□ It can be observed from the relational table that a doctor cannot have two simultaneous appointments and thus DOCTOR_NAME and DATE_TIME is a compound key. Jitendra Tomar 20
  21. 21. NormalizationSecond Normal Form□ Similarly, a patient cannot have same time from two different doctors. Therefore, PATIENT_NAME and DATE_TIME attributes are also a candidate key.□ The Relation could be depicted as: PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO, CONSULT_DURATION)□ In this relation composite key (DOCTOR_NAME, DATE_TIME) is taken as a primary key.□ But there is a partial dependency as CONTACT_NO is Functionally Dependent upon DOCTOR_NAME, and hence the relation is not in 2NF. Jitendra Tomar 21
  22. 22. NormalizationSecond Normal Form□ Therefore, to bring the relation in 2NF, the information about doctors and their contact numbers have to be separated from information about patients and their appointments with doctors. Thus, the relation is decomposed into two table namely PATIENT_DOCTOR and DOCTOR.□ The relation in 2NF can be depicted as: PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO, CONSULT_DURATION) DOCTOR (DOCTOR_NAME, CONTACT_NO) Jitendra Tomar 22
  23. 23. NormalizationSecond Normal Form Contact_No Doctor_Name Patient_Name DOB Date_Time Cons Duration Doctor_Name Patient_Name DOB Date_Time Cons Duration Doctor_Name Contact_No Jitendra Tomar 23
  24. 24. NormalizationSecond Normal FormProblems in above example:□ Deleting a record from relation PATIENT_DOCTOR may lose patient’s details□ Any changes in the details of the patient may involve changing multiple occurrences because this information is still stored redundantly. Jitendra Tomar 24
  25. 25. Normalization Jitendra Tomar 25
  26. 26. NormalizationThird Normal Form□ A relation R is said to be in Third Normal Form (3NF) if □ it is in 2NF, □ The non-prime attributes are □ Mutually independent & □ Functionally dependent on the primary (or relation) key. □ No attributes of the relation should be transitively functionally dependent on the primary key or no non prime attribute is functionally dependent on another non-prime attribute. □ This means that a relation in 3NF consists of the primary key and a set of independent non prime attributes. Jitendra Tomar 26
  27. 27. NormalizationThird Normal Form□ The following Relation in 1NF was depicted as: PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO, CONSULT_DURATION)□ The relation in 2NF can be depicted as: PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO, CONSULT_DURATION) DOCTOR (DOCTOR_NAME, CONTACT_NO) Jitendra Tomar 27
  28. 28. NormalizationThird Normal Form□ Therefore, to bring the relation in 2NF, the information about doctors and their contact numbers have to be separated from information about patients and their appointments with doctors. Thus, the relation is decomposed into two table namely PATIENT_DOCTOR and DOCTOR.□ The relation in 2NF can be depicted as: PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO, CONSULT_DURATION) DOCTOR (DOCTOR_NAME, CONTACT_NO) Jitendra Tomar 28
  29. 29. NormalizationThe FFD in 1NF and 2NF was given as below: Contact_No Doctor_Name Patient_Name DOB Date_Time Cons Duration Doctor_Name Patient_Name DOB Date_Time Cons Duration Doctor_Name Contact_No Jitendra Tomar 29
  30. 30. NormalizationThird Normal Form□ But the Relation still have the problem as the attribute DOB is not Functionally Dependent on the composite primary key i.e. {Doctor_Name, Date_Time}□ Infact the attribute DOB is FD on Patient_Name which is the non-key attribute.□ Thus Patient_Name and DOB are not mutually independent.□ Since in 3NF, the non-key attributes should be mutually independent and no transitive dependency should occur, hence the relation violates the requisite for 3NF.□ The relation has to be decomposed to remove the parts that are not directly dependent on primary key. Jitendra Tomar 30
  31. 31. NormalizationThe FFD in 1NF and 2NF was given as below: Doctor_Name Patient_Name DOB Date_Time Cons Duration Doctor_Name Contact_No Doctor_Name Cons Duration Patient Name DOB Date_Time Doctor_Name Contact_No Jitendra Tomar 31
  32. 32. NormalizationThird Normal Form□ Thus the relation in 3NF is depicted as: PATIENT (PATIENT_NAME, DOB) PATIENT_DOCTOR (PATIENT_NAME, DOCTOR_NAME, DATE_TIME, CONSULT_DURATION) DOCTOR (DOCTOR_NAME, CONTACT_NO) Jitendra Tomar 32
  33. 33. NormalizationThird Normal Form□ FFD of the relation in 3NF. Doctor_Name Cons Duration Patient_Name Date_Time Patient_Name DOB Doctor_Name Contact_No Jitendra Tomar 33

×