09.01 normalization


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

09.01 normalization

  1. 1. 9.0 NormalizationDBMSBishal
  2. 2. 9.1 Introduction To reduce redundancy which means that information isto be stored only once Relations are normalized so that when relations in adatabase are to be altered during the lifetime of thedatabase, we do not lose information or introduceinconsistencies Alterations normally needed are Insertion should be possible without being forced to leaveblank fields for some attributes. Deletion should be possible without losing vitalinformation unknowingly. Updating should be possible without exhaustivelysearching all the tuples in the relation.
  3. 3. 9.2 Functional Dependency Let X and Y be the two attributes of a relation. Giventhe value of X, if there is only one value of Ycorresponding to it, then Y is said to be functionallydependent on X. This is indicated by the notation: X  Y For example, given the value of item code, there isonly one value of item name for it. Thus item name isfunctionally dependent on item code. This is shown as: Item code  item name Functional dependency may also be based on acomposite attribute. For example, if we write X,Z  Y
  4. 4.  It means that there is only one value of Ycorresponding to given values of X,Z. In other words, Yis functionally dependent on the composite X,Z. Student (Roll_no, Name, Address , Dept ,Year_of_study) In this relation, Name is functionally dependent on Rollno. In fact, given the value of Roll no., the values of allother attributes can be uniquely determined. Name andDepartment are not functionally dependent, becausegiven the name of a student, one cannot find hisdepartment uniquely. This is due to the fact that theremay be more than one student with the same name.Name in this case is not a key.
  5. 5. Relation Key: Given a relation, if the value of an attribute X uniquelydetermines the value of all other attributes in a row, thenX is said to be the key of that relation. Sometimes more than one attribute is needed touniquely determine other attributes in a relation row. Inthat case such a set of attributes is the key.RollNoNameAddressDepartmentYearOfStudy
  6. 6. 9.3 Anomalies in a Database STDINF(Name, Course, Phone_No, Major, Prof, Grade) Dependencies : Name  Phone_No, Name  Major, Name,Course  Grade Course  ProfName Course Phone_NoMajor Prof GradeJones 353 237-4539 Comp Sci Smith ANg 329 427-7390 Chemistry Turner BJones 328 237-4539 Comp Sci Clark BMartin 456 388-5183 Physics James ADulles 293 371-6259 DecisionSciCook C
  7. 7. 9.3 Anomalies in a Database Here the attribute Phone_No, which is not in any keyof the relation scheme STDINF, is not functionallydependent on the whole key, but only one part of thekey, namely, the attribute Name. Similarly, theattributes Major and Prof, which are not in any key ofthe relation scheme STDINF either, are fullyfunctionally dependent on the attributes Name andCourse, respectively. Thus the determinants of these functionaldependencies are again not the entire key, but onlypart of the key of the relation. Only the attribute Gradeis fully functionally dependent on the key NameCourse.
  8. 8. Redundancy The aim of the database system is to reduceredundancy, meaning that information is to be storedonly once. Storing information several times leads to thewaste of storage space and an increase in the total sizeof the data stored. Updates to the database with such redundancies havethe potential of becoming inconsistent as explainedbelow. In the relation of table, the Major andPhone_No. of a student are stored several times in thedatabase: once for each course that is or was taken bya student.
  9. 9. Update Anomalies Multiple copies of the same fact may lead to updateanomalies or inconsistencies when an update ismade, and only some of the multiple copies areupdated. Thus, a change in the Phone_No. of Jonesmust be made, for consistency, in all tuples pertainingto the student Jones. If one of the three tuples is notchanged to reflect the new Phone_No. of Jones, therewill be an inconsistency in the data.
  10. 10. Insertion Anomalies If this is the only relation in the database showing theassociation between a faculty member and the coursehe or she teaches, the fact that a given professor isteaching a given course cannot be entered in thedatabase unless a student is registered in the course.Also, if another relation also establishes a relationshipbetween a course and a professor, who teaches thatcourse, the information stored in these relations hasto be consistent.NameCoursePhone_NoMajorProfGrade
  11. 11. Deletion Anomalies If the only student registered in a given coursediscontinues the course, the information as to whichprofessor is offering the course will be lost, if this is theonly relation in the database showing the associationbetween a faculty member and the course she or heteaches. If another relation in the database alsoestablishes the relationship between a course and aprofessor, who teaches that course, the deletion of thelast tuple in STDINF for a given course will not cause theinformation about the courses teacher to be lost. The problems of database inconsistency and redundancyof data are similar to the problems that exist in thehierarchical and network models. These problems areaddressed in the network model by the introduction ofvirtual fields and in the hierarchical model by the
  12. 12. Decomposition The decomposition of a relation scheme R = (A1,A2,...,An) is its replacement by a set of relationschemes {R1,R2,...,Rm} such that R1 ≤ R for 1 ≤ i ≤m and R1 ∪ R2 ∪ Rm = R A relation scheme R can be decomposed into acollection of relation schemes { R1,R2,R3..., Rm } toeliminate some of the anomalities contained in theoriginal relation R. Here the relation schemes R1 (1 ≤i ≤ m) are subsets of R and the intersection of R1 ∩Rj for i≠ j need not be empty. Furthermore, the unionof Rj (1 ≤ i ≤ m) is equal to R, i.e. R=R1 R2... Rm
  13. 13. Decomposition STUDENT_ INFO (Name, Phone_No, Major) TRANSCRIPT (Name, Course, Grade) TEACHER (Course, Prof) STUDENT_ INFO (Name, Phone_No, Major) The first relation scheme gives the phone number and the major of each student, andsuch information will be stored only once for each student. Any change in the phonenumber will thus require a change in only one tuple of this relation. TRANSCRIPT (Name, Course, Grade) The second relation scheme stores the grade of each student in each course that thestudent is or was enrolled in. The third relation scheme records the teacher of eachcourse. One of the disadvantages of replacing the original relation scheme STDINFwith the three relation schemes is that the retrieval of certain information requires anatural join operation to be performed. For instance, to find the majors of a studentwho obtained a grade of A in course 353 requires a join to be performed:(STUDENT_INFO |x| TRANSCRIPT). The same information could be derived fromthe original relation STDINF by selection and projection.
  14. 14. Decomposition When we replace the original scheme STDINF with therelation schemes STUDENT_INFO, TRANSCRIPTand TEACHER, the consistency and referentialintegrity constraints have to be enforced. Thereferential integrity enforcement implies that if a tuplein the relation TRANSCRIPT exists, such as (Jones,353,in prog), a tuple must exist in STUDENT_INFOwith Name = Jones and furthermore, a tuple must existin STUDENT_INFO with Course = 353. The attributeName, which forms part of the key of the relationTRANSCRIPT, is a key of the relationSTUDENT_INFO. Such an attribute (or a group ofattributes), which establishes a relationship betweenspecific tuples (of the same or two distinct relations), iscalled a foreign key. Notice that the attribute Course inrelation TRANSCRIPT is also a foreign key, since it is
  15. 15. Decomposition the decomposition of STDINF into the relationschemes STUDENT (Name, Phone_No, Major, Grade)and COURSE (Course, Prof.) is a bad decompositionfor the following reasons: 1. Redundancy and update anomaly, because thedata for the attributes Phone_no and Major arerepeated. 2. Loss of information, because we lose the fact that astudent has a given grade in a particular list.
  16. 16. Database Normalization Database normalization is the process of removingredundant data from your tables in to improve storageefficiency, data integrity, and scalability. In the relational model, methods exist for quantifyinghow efficient a database is. These classifications arecalled normal forms (or NF), and there arealgorithms for converting a given database betweenthem. Normalization generally involves splitting existingtables into multiple ones, which must be re-joined orlinked each time a query is issued.
  17. 17. History Edgar F. Codd first proposed the process ofnormalization and what came to be known as the1st normal form in his paper A Relational Modelof Data for Large Shared Data Banks Coddstated:“There is, in fact, a very simple eliminationprocedure which we shall call normalization.Through decomposition nonsimple domains arereplaced by ‘domains whose elements areatomic (nondecomposable) values.’”
  18. 18. Normal Form Edgar F. Codd originally established three normalforms: 1NF, 2NF and 3NF. There are now others thatare generally accepted, but 3NF is widely consideredto be sufficient for most applications. Most tableswhen reaching 3NF are also in BCNF (Boyce-CoddNormal Form).