BIS04 Data Modelling - II

3,222 views
3,088 views

Published on

Course Material for MBA course on Business Information Systems

Published in: Education, Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,222
On SlideShare
0
From Embeds
0
Number of Embeds
123
Actions
Shares
0
Downloads
378
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.
  • Normalized relational data modeling is the classic modeling technique used for organizing entities defined by unique identifiers and attributes that are wholly dependent upon those identifiers. This is the modeling technique that database administrators and modelers are most familiar with, and is most commonly associated with transaction systems development. Normalization is the process of removing redundancy in data by separating the data into multiple tables. There are well established rules of normalization: Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key. (1 st Normal Form) Eliminate Redundant Data. If an attribute depends on only part of a multi-valued key, remove it to a separate table. (2 nd Normal Form) Eliminate Columns Not Dependent on Key. If attributes do not contribute to a description of the key, remove them to a separate table. (3 rd Normal Form) Isolate Independent Multiple Relationships. No table may contain two or more 1:n or n:m relationships that are not directly related. (4 th Normal Form) Isolate Semantically Related Multiple Relationships. There may be practical constrains on information that justify separating logically related many-to-many relationships. (5 th Normal Form) The last two rules, 4 th and 5 th Normal Forms, are not often attained. It is not uncommon, in fact, to denormalize from 3 rd Normal Form in the physical model to address performance concerns. Consequently, the rest of this section will not cover these two forms.
  • Step 1: Source material can be in many different forms. In order to begin the normalization process, this example assumes that sources were combined in an un-normalized form. All attributes of the relation must be identified, along with the key, and any repeating groups. For example, in the un-normalized table above the EMPL NO is underlined to indicate that it is the key. EMPL NO, EMPL NAME, DEPT NO, DEPT NAME, EMPL SEX, COURSE NO, COURSE NAME, and ASSESSMENT are all attributes of the relation. COURSE DATA is recognized as a repeating group, and this is notated with the asteric.
  • Step 2: In order to achieve First Normal Form(1NF) all repeating groups must be removed. The repeating groups were identified in step one, and in order to remove them a new key is created. The relation now has two keys, also referred to as a concatenated key. For example, in the 1NF table above, the COURSE DATA repeating group, the relation now lists only the attributes for the repeating groups, and EMPL NO, and COURSE NO comprise the concatenated key for the relation.
  • Step 3: Removing partial dependencies from the relation will result in the model being in Second Normal Form(2NF). It should be noted that if a relation is in 1NF, and has a single key, then it is already in 2NF. If an attribute is not fully functionally dependent upon the entire key, then this attribute must be removed, and a new relation must be created. A foreign key will indicate the relationship between the relations. For example, in the 2NF table above, EMPL NAME, DEPT NO, DEPT NAME, and EMPL SEX are only dependent upon the EMPL NO, not the COURSE NO. These items are separated into a single relation. COURSE NAME is only dependent upon the COURSE NO. These items are separated into a single relation. COURSE NO, and EMPL NO will now become foreign keys to indicate the relationships among the relations. ASSESSMENT is the only attribute dependent upon the whole key, and hence the creation of another relation.
  • Step 4: Removing mutual dependencies from the relation will result in Third Normal Form (3NF). If an attribute of a relation is mutually dependent upon another attribute, then these attributes must be removed into another relation. A foreign key will indicate the relationship between the two relations. For example, in the 3NF table above, DEPT NAME is mutually dependent upon DEPT NO. The DEPT NO will remain in the employee relation, and a new relation will be created for the DEPT NO and DEPT NAME. DEPT NO in the employee relation will become the foreign key.
  • Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.
  • BIS04 Data Modelling - II

    1. 1. Business Information Systems Data Modeling - Normalisation Prithwis Mukerjee, Ph.D.
    2. 2. Normalization <ul><li>Pros: </li></ul><ul><ul><li>Ensures that each attribute belongs to the entity to which it is assigned </li></ul></ul><ul><ul><li>Redundant storage of information is minimized </li></ul></ul><ul><li>Cons: </li></ul><ul><ul><li>Can adversely affect performance if rigorously implemented </li></ul></ul><ul><ul><li>Can adversely affect deadlines if rigorously implemented </li></ul></ul>NORMALIZATION A formal data modeling approach to examining and validating the model.
    3. 3. Foundations : Revisited <ul><li>Entity in the Data Model </li></ul><ul><ul><li>The basic element about which we need to store and process data </li></ul></ul><ul><ul><ul><li>Order, Customer, Product </li></ul></ul></ul><ul><ul><ul><li>For each entity there will be multiple instances </li></ul></ul></ul><ul><ul><ul><ul><li>Multiple “orders” with the means to distinguish one specific “order” from another </li></ul></ul></ul></ul><ul><ul><li>Actions can be performed on an entity </li></ul></ul><ul><ul><ul><li>C reate / Add </li></ul></ul></ul><ul><ul><ul><li>R ead / Display </li></ul></ul></ul><ul><ul><ul><li>U pdate / Modify </li></ul></ul></ul><ul><ul><ul><li>D elete </li></ul></ul></ul><ul><li>Object Oriented Terminology </li></ul><ul><ul><li>Class : equivalent to an entity </li></ul></ul><ul><ul><ul><li>Instances of an entity are instances of a class </li></ul></ul></ul><ul><ul><li>Methods : Actions that affect a class </li></ul></ul><ul><ul><ul><li>Most methods can be mapped down to variations of C-R-U-D </li></ul></ul></ul>
    4. 4. Normal Forms <ul><li>Dr. E. F. Codd identified ‘normal forms’ as the different states of a ‘normalized relational’ data model. </li></ul><ul><ul><li>1NF = No repeating groups </li></ul></ul><ul><ul><li>2NF = No partial key dependencies </li></ul></ul><ul><ul><li>3NF = No non-key interdependencies </li></ul></ul><ul><ul><li>4NF = No independent multiple relationships </li></ul></ul><ul><ul><li>5NF = No semantically related multiple relationships </li></ul></ul>
    5. 5. How to Normalize an Entity <ul><li>Before you normalize an entity, identify its Primary Key </li></ul><ul><li>Identify and resolve violations of 1NF - make sure there are no repeating groups </li></ul><ul><li>Identify and resolve violations of 2NF - make sure that each non-key attribute depends on the entire key </li></ul><ul><li>Identify and resolve violations of 3NF - make sure that no non-key attribute depends on another non-key attribute </li></ul>
    6. 6. Order Management System <ul><li>Orders are the life blood for any commercial organisation </li></ul><ul><ul><li>Orders are received or recorded / created </li></ul></ul><ul><ul><li>Reports are prepared on orders at hand. Orders are viewed </li></ul></ul><ul><ul><li>Orders are modified or updated depending on the situation </li></ul></ul><ul><ul><li>Orders are cancelled or archived when they are no more necessary </li></ul></ul>
    7. 7. Identify the Primary Key ORDER ORDER
    8. 8. First Normal Form - 1NF <ul><li>Ask the following for each attribute: </li></ul><ul><ul><li>Does this attribute occur more than once for any given instance? ( NO REPEATING GROUPS ) </li></ul></ul><ul><li>If yes, </li></ul><ul><ul><li>build a new entity </li></ul></ul><ul><ul><li>move all ‘repeating’ attributes to the new entity </li></ul></ul><ul><ul><li>select or formulate attribute(s) for the new entity’s PK </li></ul></ul><ul><ul><li>build an IDENTIFYING relationship FROM THE ORIGINAL entity TO THE NEW entity </li></ul></ul>
    9. 9. 1NF Violation & Solution Violation : Repeating Groups Solution : Split into two entities { { ORDER ORDER ORDER ITEM FK FK
    10. 10. Cleaner 1 NF Solution ORDER ORDER ITEM FK <ul><li>The new entity should be named to reflect its intention, given a primary key, and the inherited foreign key will be present as part of the primary key. </li></ul>
    11. 11. Second Normal Form - 2NF <ul><li>For ONLY those entities that have a composite key, ask the following of each non-key attribute: </li></ul><ul><ul><li>Is this attribute dependent on part of the primary key? ( NO PARTIAL KEY DEPENDENCIES ) </li></ul></ul><ul><li>If yes, </li></ul><ul><ul><li>build a new entity </li></ul></ul><ul><ul><li>move all the attributes having the same partial key dependency to the new entity </li></ul></ul><ul><ul><li>use the determinant attribute as the key or determine a better PK (move the PK attribute too) </li></ul></ul><ul><ul><li>build and name an identifying relationship FROM NEW entity BACK TO ORIGINAL entity </li></ul></ul>
    12. 12. 2NF Violation and Solution ORDER ITEM ORDER ITEM PRODUCT Violation : Description, Unit Price does not depend on full PK key Solution : Split into two entities FK
    13. 13. Now the solution looks like ... ORDER ORDER ITEM PRODUCT
    14. 14. Third Normal Form - 3NF <ul><li>For each non-key attribute, ask the following: </li></ul><ul><ul><li>Does this attribute depend on some other non-key attribute? ( NO Non-Key INTERDEPENDENCIES ) </li></ul></ul><ul><li>If yes, </li></ul><ul><ul><li>build a new entity to contain all attributes with same non-key dependency </li></ul></ul><ul><ul><li>use determinant attribute(s) as the PK </li></ul></ul><ul><ul><li>build and name a non-identifying, non-mandatory relationship FROM NEW entity BACK TO ORIGINAL entity </li></ul></ul>
    15. 15. 3NF Violation & Solution ORDER ORDER CUSTOMER Violation : Address, Credit Limit does not depend on OrderID, but on Customer Name Solution : Create separate entity for Customer FK
    16. 16. Final Solution ORDER CUSTOMER ORDER ITEM PRODUCT FK FK FK ORDER UnNormalised Entity Normalised Entities Information about Customers and Products can be recorded even when there are no Orders
    17. 17. Entities can proliferate ! <ul><li>Order Management System </li></ul><ul><ul><li>Began with ORDER </li></ul></ul><ul><ul><li>Ended with ORDER, ORDER-ITEM, PRODUCT, CUSTOMER </li></ul></ul><ul><li>Manufacturing System </li></ul><ul><ul><li>Might begin with PRODUCT </li></ul></ul><ul><ul><li>End with PRODUCT, MATERIAL, MACHINE, DIMENSION ??? </li></ul></ul><ul><li>Marketing System </li></ul><ul><ul><li>Might begin with CUSTOMER </li></ul></ul><ul><ul><li>End with ?? </li></ul></ul><ul><li>BUT : Entities should be unique </li></ul><ul><ul><li>CUSTOMER must have same attributes whether defined in </li></ul></ul><ul><ul><ul><li>Order Management System </li></ul></ul></ul><ul><ul><ul><li>Marketing System </li></ul></ul></ul><ul><ul><li>PRODUCT must have same attributes whether defined in </li></ul></ul><ul><ul><ul><li>Order Management System </li></ul></ul></ul><ul><ul><ul><li>Manufacturing System </li></ul></ul></ul><ul><li>This is why Data Modelling is so important </li></ul>
    18. 18. Rationale for Normalisation <ul><li>Data is easier to define </li></ul><ul><li>Data interdependencies are identified </li></ul><ul><li>Data ambiguities are resolved </li></ul><ul><li>Data model can be more flexible </li></ul><ul><li>Data model is easier to maintain </li></ul><ul><li>The structure can be very complex </li></ul><ul><ul><li>Proliferation of entities and relationships </li></ul></ul><ul><li>Performance can become an issue </li></ul>
    19. 19. Denormalisation ? <ul><li>One entity for the entire month ? </li></ul><ul><ul><li>Month-Attendance </li></ul></ul><ul><ul><ul><li>Emp ID </li></ul></ul></ul><ul><ul><ul><li>Month </li></ul></ul></ul><ul><ul><ul><li>Year </li></ul></ul></ul><ul><ul><ul><li>Day 1 </li></ul></ul></ul><ul><ul><ul><li>Day 2 </li></ul></ul></ul><ul><ul><ul><li>Day 3 </li></ul></ul></ul><ul><ul><ul><li>... </li></ul></ul></ul><ul><ul><ul><li>Day 31 </li></ul></ul></ul><ul><li>1 record per employee per month </li></ul><ul><ul><li>10,000 employees, 12 months = 120,000 records </li></ul></ul><ul><li>One entity for each day </li></ul><ul><ul><li>Daily-Attendance </li></ul></ul><ul><ul><ul><li>EmpID </li></ul></ul></ul><ul><ul><ul><li>Month </li></ul></ul></ul><ul><ul><ul><li>Year </li></ul></ul></ul><ul><ul><ul><li>Date </li></ul></ul></ul><ul><ul><ul><li>YES / NO </li></ul></ul></ul><ul><li>29 – 31 records per employee every month </li></ul><ul><ul><li>10,000 employees, 12 months = 3.6 million records </li></ul></ul>This will cause a performance problem
    20. 20. The Managerial Perspective <ul><li>Entities </li></ul><ul><ul><li>Have all entities in the system been identified and named correctly ? </li></ul></ul><ul><ul><li>Have all attributes of the entities been identified and named correctly </li></ul></ul><ul><li>Normalisation </li></ul><ul><ul><li>Are all enties in Third Normal Form ? </li></ul></ul><ul><ul><li>If NOT, why NOT ? </li></ul></ul><ul><ul><li>Have we gone overboard with Normalisation ? </li></ul></ul><ul><ul><ul><li>Is there a need to de-Normalise </li></ul></ul></ul><ul><li>Quick way to remember the rules of normalisation is that every attribute in the entity must depend </li></ul><ul><ul><li>On THE KEY </li></ul></ul><ul><ul><li>THE WHOLE KEY and </li></ul></ul><ul><ul><li>NOTHING BUT THE KEY </li></ul></ul>

    ×