1NF Violation & Solution Violation : Repeating Groups Solution : Split into two entities { { ORDER ORDER ORDER ITEM FK FK
2NF Violation and Solution ORDER ITEM ORDER ITEM PRODUCT Violation : Description, Unit Price does not depend on full PK key Solution : Split into two entities FK
3NF Violation & Solution ORDER ORDER CUSTOMER Violation : Address, Credit Limit does not depend on OrderID, but on Customer Name Solution : Create separate entity for Customer FK
Final Solution ORDER CUSTOMER ORDER ITEM PRODUCT FK FK FK ORDER UnNormalised Entity Normalised Entities Information about Customers and Products can be recorded even when there are no Orders
Editor's Notes
Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.
Normalized relational data modeling is the classic modeling technique used for organizing entities defined by unique identifiers and attributes that are wholly dependent upon those identifiers. This is the modeling technique that database administrators and modelers are most familiar with, and is most commonly associated with transaction systems development. Normalization is the process of removing redundancy in data by separating the data into multiple tables. There are well established rules of normalization: Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key. (1 st Normal Form) Eliminate Redundant Data. If an attribute depends on only part of a multi-valued key, remove it to a separate table. (2 nd Normal Form) Eliminate Columns Not Dependent on Key. If attributes do not contribute to a description of the key, remove them to a separate table. (3 rd Normal Form) Isolate Independent Multiple Relationships. No table may contain two or more 1:n or n:m relationships that are not directly related. (4 th Normal Form) Isolate Semantically Related Multiple Relationships. There may be practical constrains on information that justify separating logically related many-to-many relationships. (5 th Normal Form) The last two rules, 4 th and 5 th Normal Forms, are not often attained. It is not uncommon, in fact, to denormalize from 3 rd Normal Form in the physical model to address performance concerns. Consequently, the rest of this section will not cover these two forms.
Step 1: Source material can be in many different forms. In order to begin the normalization process, this example assumes that sources were combined in an un-normalized form. All attributes of the relation must be identified, along with the key, and any repeating groups. For example, in the un-normalized table above the EMPL NO is underlined to indicate that it is the key. EMPL NO, EMPL NAME, DEPT NO, DEPT NAME, EMPL SEX, COURSE NO, COURSE NAME, and ASSESSMENT are all attributes of the relation. COURSE DATA is recognized as a repeating group, and this is notated with the asteric.
Step 2: In order to achieve First Normal Form(1NF) all repeating groups must be removed. The repeating groups were identified in step one, and in order to remove them a new key is created. The relation now has two keys, also referred to as a concatenated key. For example, in the 1NF table above, the COURSE DATA repeating group, the relation now lists only the attributes for the repeating groups, and EMPL NO, and COURSE NO comprise the concatenated key for the relation.
Step 3: Removing partial dependencies from the relation will result in the model being in Second Normal Form(2NF). It should be noted that if a relation is in 1NF, and has a single key, then it is already in 2NF. If an attribute is not fully functionally dependent upon the entire key, then this attribute must be removed, and a new relation must be created. A foreign key will indicate the relationship between the relations. For example, in the 2NF table above, EMPL NAME, DEPT NO, DEPT NAME, and EMPL SEX are only dependent upon the EMPL NO, not the COURSE NO. These items are separated into a single relation. COURSE NAME is only dependent upon the COURSE NO. These items are separated into a single relation. COURSE NO, and EMPL NO will now become foreign keys to indicate the relationships among the relations. ASSESSMENT is the only attribute dependent upon the whole key, and hence the creation of another relation.
Step 4: Removing mutual dependencies from the relation will result in Third Normal Form (3NF). If an attribute of a relation is mutually dependent upon another attribute, then these attributes must be removed into another relation. A foreign key will indicate the relationship between the two relations. For example, in the 3NF table above, DEPT NAME is mutually dependent upon DEPT NO. The DEPT NO will remain in the employee relation, and a new relation will be created for the DEPT NO and DEPT NAME. DEPT NO in the employee relation will become the foreign key.
Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.