Presented By Aliya Saldanha DENORMALISATION PROS AND CONS
OBJECTIVES Definition of terms. Describe the denormalization design process. Denormalization Strategies A Comparative Case Study Know the pros and cons of denormalization The Dangerous Illusion Conclude
Introduction RDBMS design - conceptual and physical modeling levels.  Conceptual diagrams - precursor to designing relational tables.  Critical issues- level of system performance, reflected by system response time
Normalization The normalized model is a cornerstone for every database system.  Process of decomposing large, inefficiently structured tables into smaller, more structured tables without losing any data in the process. There are still times where we denormalize a database to enhance performance
What is normalization? A series of steps followed to obtain a database  that is  consistent and avoids duplication The process passes through fulfilling Normal Forms A table is said to be in a certain normal form if    it satisfies certain constraints KEY POINTS Each table represents a single subject Keeps redundancy to a minimum All attributes are dependent on the primary key Checks stability and integrity of E-R diagram Removes Insert, Update, Delete anomalies. 1 st  Normal Form 2 nd  Normal Form 3 rd  Normal Form BCNF 4 th  Normal Form 5 th  Normal Form Normalized relational db  model Relational db model
As normalization progresses… The  number of Relations  required to represent the data of the application being normalized  increases . The increased number of tables require multiple  JOIN’s  to combine data from different tables. (more the joins the worse it gets) Queries that have a lot of  complex joins  will require more CPU usage and will  adversely  affect performance.
Practically speaking Queries run  slowly . Reports take  too long  to print. On-screen forms  take time  to populate. Web pages take  too long  to populate. More complicated SQL required for  multi-table  queries and joins. In short,  extra work  for DBMS can mean slower applications
Other issues… No calculated values .  CV’s are a fact of life for all applications, but a normalized DBMS lacks them.  Non-reproducible Calculations .  Application must generate them on the fly as needed. If your application changes over time, you risk not being able to reproduce prior results. Join Jungles .  When each fact is stored in exactly one place, you it is daunting to pull together everything for a certain query. Making it  hard to code, hard to debug, and dangerous to alter.  Performance .  When you face a JOIN jungle you almost always face performance problems.
??before denormalizing Can the system achieve acceptable performance  without  denormalizing?  Will the performance of the system  after  denormalizing still be  unacceptable ?  Will the system be  unreliable  due to denormalization?  If the answer to any of these is  "yes,"  avoid denormalization because any benefit that is accrued will not exceed the cost.
Denormalization and Why? Frequently, performance needs dictate very quick retrieval capability for data stored in relational databases. To accomplish this, sometimes the decision is made to denormalize the physical implementation.  Denormalization is the process of putting one fact in numerous places. This speeds data retrieval at the expense of data modification.
Does it mean Un-normalization ? ‘ Denormalization’ does not mean that  anything goes.  Denormalization does not mean chaos.  Un-normalized data model is  little  or  no  analysis is performed. In short, seek to denormalize a data model that has already been normalized.
DENORMALIZATION PROCESS Develop E-R  Refinement &Normalize Identify candidates Identifying form Map to physical schema Determining integrity effects
Development of Conceptual data model E-R/M aims at identifying the entities that are part of the system, the attributes that make up these entities, and the dependencies between entities. No Dependency  among the attributes – Normalization resolves the functional dependencies  between attributes  Shows  Data at rest  – Denormalization considers the types of queries and their frequency 1
Refinement and normalization The ERD is further refined, in order to resolve the functional dependencies between the attributes of an Entity.  May lead to splitting of tables to reduce data redundancy. Identifying candidates for denormalization Application performance criteria. Type of queries to be executed (update/retrieve). Frequency of queries Number of rows accessed by each transaction. Cardinality – 1:1, 1:M Derived data, Lookup data 2 3
Determine effect on data integrity The effect of denormalization is reviewed.  Denormalizing may lead to performance degradation  Or unacceptable consistency issues. In such a case Denormalization decision must be reconsidered  4
Form for denormalized entity Identifying what form the denormalized entity may take  We move down the normal forms ladder of steps. 5 Map conceptual scheme to physical scheme. Once the scheme is   tested and verified it is implemented. 6
Pre joined Tables Report Tables Mirror Tables Split Tables DENORMALIZATION STRATEGIES Redundant Data Repeating Groups Derivable Data Speed Tables
Pre-joined tables Two or more tables are joined and the result is stored as another table. When the  cost  of joining is prohibitive Example: Retail store databases Contain only those columns absolutely necessary for application to meet processing needs. The pre-joined table must be created periodically using SQL to join the normalized tables.
1:1 Relationships
M:M Relationship
The normalised tables
Denormalised tables
Report Tables When specialized critical reports are too costly to generate Create table that contains the report. To be viewed in online environments. Lot of formatting and data manipulation
Mirror tables When tables are required concurrently by two different types of environments.  If online processing and decision support access the same table  Can duplicate table, use second table for read-only use Example: Heavy Online Traffic  Care must be taken to periodically migrate the foreground data to background tables. Performance bottlenecks are resolved.
Split tables When distinct groups use different parts of a table.  - vertically  - horizontally. The original table must  be  available for  certain transactions.
Vertical Split Attributes are divided between the two tables,  primary key put into both tables  Particularly useful if one group of applications accesses some columns and another group accesses different columns Example:  Many columns of the customer table contain data specific to credit limit assessment, whereas others contain more general contact and customer profiling information Split the table vertically, one partition containing credit limit information, and the other containing the more general customer details.
Horizontal Split Rows are divided between two tables Usually rows are divided by  range of key values The operation of UNION ALL, when applied later should not add more rows than contained in the original, un-split tables  Example:  For a large customer table, we might split it into two tables, one for home-based customers, and the other for overseas customers.
Redundant Data Some columns of other table are made redundant in a given table. To reduce the number of table joins Use when 1/more columns from one table are accessed whenever data from another table is accessed. The original column  must not be removed  from the table. Best for data that is not updated often. Example: Consider the DEPARTMENT and EMPLOYEE tables. Queries require the name of the employee's department then the department name column could be carried as redundant data in the EMP table.
Repeating Groups Another table is created that contains the columns corresponding to every element of group.  Example   A (Customer_No, Balance_period, Balance) B (Customer_No, Balance_period1, Balance_period2, Balance_period3, Balance_period4, Balance_period5) Points To Remember The data is  rarely or never aggregated , averaged, or compared within the row  The data has a  stable  number of occurrences  The data is usually accessed  collectively   The data has a  predictable pattern  of insertion and deletion
Derivable data Derived data is data not directly stored in the database, but is instead calculated from the data which is stored in the database Cost of deriving data using complicated formulae is  prohibitive  then consider storing the derived data in a column instead of calculating it. Example:  Score Calculation  The stored derived data must be updated whenever the underlying data it is based on is changed.
Speed tables A speed table is a denormalized version of a hierarchy.  Every parent has a row for every child that reports to it at any level, either directly or indirectly. A speed table optionally carries information such as level within a hierarchy and whether or not the child is at a detail most level  within the hierarchy (bottom of tree) Used when tree like hierarchy is to be stored in database. Data is replicated within a speed table to increase the speed of data retrieval.
NORMALISED DENORMALIZED
CASE STUDY-Prejoin A simplified retail example Before denormalization: 1 M SALES SALES_DETAIL
Prejoin Denormalization A simplified retail example... After denormalization: SALES_AND_DETAILS
SAMPLE QUERY Q) What was my total volume between '06-AUG-08'and '06-AUG-09'? BEFORE denormalization: select sum(sales_detail.product_qty) from sales ,sales_detail where sales.sale_id = sales_detail.sale_id and sales.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
 
Sample Query 2 Q) What was my total volume between '06-AUG-08'and '06-AUG-09'? AFTER denormalization: select sum(product_qty) from  sales_and_details where  sales_and_details.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
 
Sample Query 3 What happens if we ask about the number of “sales” rather than the quantity transacted? BEFORE denormalization: select count(*) from sales  where  sales.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
 
What happens if we ask about the number of “sales” rather than the quantity transacted? AFTER denormalization: select count(distinct sale_id) from  sales_and_details  where  sales_and_details .sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
 
PROS Convenience Using calculated values it is far easier for programmers to generate reports without have generating code to calculate them.  Saves CPU time. Simple Queries Each eliminated JOIN is a simpler query that is easier to get right the first time, easier to debug, and easier to keep correct when changed.
PROS The Performance Argument We end up improving performance (speed) because we need fewer JOINs to retrieve the same number of facts.  The Storage Argument Data  availability the locations where it will be used.  The number of foreign keys are reduced (how separate tables are related), the number of indexes are reduced (foreign keys are frequently indexed)
CONS Leads to data duplication and increases the storage requirement of the database. Documenting decisions, ensuring valid data, data migration. Having multiple copies leads to synchronization issues. Increased update time.
Physically speaking.,.,, Performance determined  entirely at the “physical database level “ Storage and access methods Hardware Physical design DBMS implementation details Degree of concurrent access
AN ILLUSION In denormalization  have an understanding that: 1. Higher the normalization, greater the number of tables 2. Greater number of tables require more joins  3. Joins slow performance 4. Denormalization reduces number of tables and, hence less joins, improved performance. The problem is that points 2 and 3 are not necessarily true, in which case point 4 does not hold and even if they hold true.
It is claimed that from the integrity perspective, there are two database design options:  Fully normalize the database thereby  maximizing the simplicity  of integrity enforcement; Denormalize the database and  complicate integrity  enforcement. According to the illusion argument, the first choice is the better option. Why, then, the  prevailing insistence  on the second choice? The argument for denormalization is, of course, based on  performance considerations
Conclusion In a real-life project, you have to bring back some data redundancy for performance reasons. Database design is about efficient data engineering - tradeoffs in design choices , choosing the right design for the performance requirements As stated by most database practitioners, denormalization may or may not result in a better performance or a more flexible data structure for users. Selective denormalization is usually required. Weigh and decide whether the perceived benefits are worth the effort to maintain the database properly. The importance of the present argument between its pros and cons is of a vital importance
References [1] G. Lawrence Sanders & Seung kyoon Shin , Denormalization Effects on Performance of RDBMS, Proceedings of the 34th Hawaii International Conference on System Sciences, 2001. [2] Denormalization strategies for data retrieval from data warehouses, Seung Kyoon Shina,*, G. Lawrence Sandersb,1a [3] Marsha Hanus, To Normalize or Denormalize, That is the Question, Candle Corporation [4] Denormalization Guidelines by  Craig S. Mullins  Published: PLATINUM technology, inc. June 1, 1997 [5] Douglas B. Bock and John F. Schrage, Department of Computer Management and Information Systems, Southern Illinois University Edwardsville, published in the 1996 Proceedings of the Decision Sciences Institute, Orlando, Florida, November, 1996 [6] The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1 and Part 2, - Fabian Pascal , DM Review Magazine, July 2002  [7] Service-Oriented Data Denormalization for Scalable Web Applications, Zhou Wei (Tsinghua University Beijing, China), Jiang Dejun (Tsinghua University), Guillaume Pierre (Vrije Universiteit Amsterdam), Chi-Hung Chi (Tsinghua Univers), Maarten van Steen(Vrije Universiteit Amsterdam);April 21-25, 2008. Beijing, China [8] Understanding Normalisation, by Micheal J Hernandez, 2001-2003.  [9] Hierarchical Denormalizing: A Possibility to Optimize the Data Warehouse Design By Morteza Zaker, Somnuk Phon-Amnuaisuk, Su-Cheng Haw [10] How Valuable is Planned Data Redundancy in Maintaining the Integrity of an Information System through its Database by Eghosa Ugboma , Florida Memorial University [11 ] Introduction to  Databases, Database Design and SQL, Zornitsa Zaharieva, CERN [12]  THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
THANK YOU
Anomalies Anomalies are inconsistencies in data that occur due to unnecessary redundancy. Update anomaly Some copies of a data item are updated, but others are not. Insertion anomaly Can’t insert “real” data without also inserting unrelated or “made up” data Deletion anomaly Can’t delete some data without also deleting other, unrelated data
First Normal Form (1NF) If a table of data meets the definition of a relation, it is in first normal form. Every relation has a unique name. Every attribute value is atomic (single-valued). Every row is unique. Attributes in tables have unique names. The order of the columns is irrelevant. The order of the rows is irrelevant.
Second Normal Form (2NF) 1NF and no partial functional dependencies. Partial functional dependency : when one or more non-key attributes are functionally dependent on part of the primary key. Every non-key attribute must be defined by the entire key, not just by part of the key. If a relation has a single attribute as its key, then it is automatically in 2NF.
Second normal form 2NF A relation that is not in 2NF Key: Student_ID, Activity Activity    Fee Fee is determined by Activity ACTIVITY Student_ID Activity Fee
Fee Divide the relation into two relations that now meet 2NF Student_ID STUDENT_ACTIVITY Activity ACTIVITY_COST Activity Key: Student_ID and Activity Key: Activity Activity    Fee
Third Normal Form (3NF) 2NF and no transitive dependencies Transitive dependency : a functional dependency between two or more non-key attributes.
A relation with a transitive dependency Student_ID HOUSING Building Fee Key: Student_ID Building    Fee Student_ID    Building   Fee
Divide the relation into two relations that now meet 3NF Student_ID STUDENT_HOUSING Building Key: Student_ID Student_ID    Building BUILDING_COST Building Fee Key: Building Building    Fee
Third Normal Form (3NF) In 2NF and every non-key column is mutually independent  means : Calculations Solution: Put calculations in queries and forms OrderDetails OrderID Item Quantity Price Put expression in text control or in query: =Quantity * Price Item Quantity Price Total Hammer 2 $10 $20 Saw 5 $40 $200 Nails 8 $1 $8
BCNF 3NF and every determinant is a candidate key.
A relation where a determinant is not a candidate key Note: Students can have a double major and have an advisor for each major. An advisor works only with students in their assigned area. Student_ID STUDENT_ADVISOR Advisor Major Primary Key: Student_ID, Major Candidate Key: Student_ID, Advisor Advisor    Major
Divide the relation into two relations that meet BCNF Student_ID STUDENT_ADVISOR Key: Student_ID, Advisor ADVISOR_MAJOR Advisor Major Advisor Key: Advisor Advisor    Major
Speed   Tables
 

When & Why\'s of Denormalization

  • 1.
    Presented By AliyaSaldanha DENORMALISATION PROS AND CONS
  • 2.
    OBJECTIVES Definition ofterms. Describe the denormalization design process. Denormalization Strategies A Comparative Case Study Know the pros and cons of denormalization The Dangerous Illusion Conclude
  • 3.
    Introduction RDBMS design- conceptual and physical modeling levels. Conceptual diagrams - precursor to designing relational tables. Critical issues- level of system performance, reflected by system response time
  • 4.
    Normalization The normalizedmodel is a cornerstone for every database system. Process of decomposing large, inefficiently structured tables into smaller, more structured tables without losing any data in the process. There are still times where we denormalize a database to enhance performance
  • 5.
    What is normalization?A series of steps followed to obtain a database that is consistent and avoids duplication The process passes through fulfilling Normal Forms A table is said to be in a certain normal form if it satisfies certain constraints KEY POINTS Each table represents a single subject Keeps redundancy to a minimum All attributes are dependent on the primary key Checks stability and integrity of E-R diagram Removes Insert, Update, Delete anomalies. 1 st Normal Form 2 nd Normal Form 3 rd Normal Form BCNF 4 th Normal Form 5 th Normal Form Normalized relational db model Relational db model
  • 6.
    As normalization progresses…The number of Relations required to represent the data of the application being normalized increases . The increased number of tables require multiple JOIN’s to combine data from different tables. (more the joins the worse it gets) Queries that have a lot of complex joins will require more CPU usage and will adversely affect performance.
  • 7.
    Practically speaking Queriesrun slowly . Reports take too long to print. On-screen forms take time to populate. Web pages take too long to populate. More complicated SQL required for multi-table queries and joins. In short, extra work for DBMS can mean slower applications
  • 8.
    Other issues… Nocalculated values . CV’s are a fact of life for all applications, but a normalized DBMS lacks them. Non-reproducible Calculations . Application must generate them on the fly as needed. If your application changes over time, you risk not being able to reproduce prior results. Join Jungles . When each fact is stored in exactly one place, you it is daunting to pull together everything for a certain query. Making it hard to code, hard to debug, and dangerous to alter. Performance . When you face a JOIN jungle you almost always face performance problems.
  • 9.
    ??before denormalizing Canthe system achieve acceptable performance without denormalizing? Will the performance of the system after denormalizing still be unacceptable ? Will the system be unreliable due to denormalization? If the answer to any of these is "yes," avoid denormalization because any benefit that is accrued will not exceed the cost.
  • 10.
    Denormalization and Why?Frequently, performance needs dictate very quick retrieval capability for data stored in relational databases. To accomplish this, sometimes the decision is made to denormalize the physical implementation. Denormalization is the process of putting one fact in numerous places. This speeds data retrieval at the expense of data modification.
  • 11.
    Does it meanUn-normalization ? ‘ Denormalization’ does not mean that anything goes. Denormalization does not mean chaos. Un-normalized data model is little or no analysis is performed. In short, seek to denormalize a data model that has already been normalized.
  • 12.
    DENORMALIZATION PROCESS DevelopE-R Refinement &Normalize Identify candidates Identifying form Map to physical schema Determining integrity effects
  • 13.
    Development of Conceptualdata model E-R/M aims at identifying the entities that are part of the system, the attributes that make up these entities, and the dependencies between entities. No Dependency among the attributes – Normalization resolves the functional dependencies between attributes Shows Data at rest – Denormalization considers the types of queries and their frequency 1
  • 14.
    Refinement and normalizationThe ERD is further refined, in order to resolve the functional dependencies between the attributes of an Entity. May lead to splitting of tables to reduce data redundancy. Identifying candidates for denormalization Application performance criteria. Type of queries to be executed (update/retrieve). Frequency of queries Number of rows accessed by each transaction. Cardinality – 1:1, 1:M Derived data, Lookup data 2 3
  • 15.
    Determine effect ondata integrity The effect of denormalization is reviewed. Denormalizing may lead to performance degradation Or unacceptable consistency issues. In such a case Denormalization decision must be reconsidered 4
  • 16.
    Form for denormalizedentity Identifying what form the denormalized entity may take We move down the normal forms ladder of steps. 5 Map conceptual scheme to physical scheme. Once the scheme is tested and verified it is implemented. 6
  • 17.
    Pre joined TablesReport Tables Mirror Tables Split Tables DENORMALIZATION STRATEGIES Redundant Data Repeating Groups Derivable Data Speed Tables
  • 18.
    Pre-joined tables Twoor more tables are joined and the result is stored as another table. When the cost of joining is prohibitive Example: Retail store databases Contain only those columns absolutely necessary for application to meet processing needs. The pre-joined table must be created periodically using SQL to join the normalized tables.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Report Tables Whenspecialized critical reports are too costly to generate Create table that contains the report. To be viewed in online environments. Lot of formatting and data manipulation
  • 24.
    Mirror tables Whentables are required concurrently by two different types of environments. If online processing and decision support access the same table Can duplicate table, use second table for read-only use Example: Heavy Online Traffic Care must be taken to periodically migrate the foreground data to background tables. Performance bottlenecks are resolved.
  • 25.
    Split tables Whendistinct groups use different parts of a table. - vertically - horizontally. The original table must be available for certain transactions.
  • 26.
    Vertical Split Attributesare divided between the two tables, primary key put into both tables Particularly useful if one group of applications accesses some columns and another group accesses different columns Example: Many columns of the customer table contain data specific to credit limit assessment, whereas others contain more general contact and customer profiling information Split the table vertically, one partition containing credit limit information, and the other containing the more general customer details.
  • 27.
    Horizontal Split Rowsare divided between two tables Usually rows are divided by range of key values The operation of UNION ALL, when applied later should not add more rows than contained in the original, un-split tables Example: For a large customer table, we might split it into two tables, one for home-based customers, and the other for overseas customers.
  • 28.
    Redundant Data Somecolumns of other table are made redundant in a given table. To reduce the number of table joins Use when 1/more columns from one table are accessed whenever data from another table is accessed. The original column must not be removed from the table. Best for data that is not updated often. Example: Consider the DEPARTMENT and EMPLOYEE tables. Queries require the name of the employee's department then the department name column could be carried as redundant data in the EMP table.
  • 29.
    Repeating Groups Anothertable is created that contains the columns corresponding to every element of group. Example A (Customer_No, Balance_period, Balance) B (Customer_No, Balance_period1, Balance_period2, Balance_period3, Balance_period4, Balance_period5) Points To Remember The data is rarely or never aggregated , averaged, or compared within the row The data has a stable number of occurrences The data is usually accessed collectively The data has a predictable pattern of insertion and deletion
  • 30.
    Derivable data Deriveddata is data not directly stored in the database, but is instead calculated from the data which is stored in the database Cost of deriving data using complicated formulae is prohibitive then consider storing the derived data in a column instead of calculating it. Example: Score Calculation The stored derived data must be updated whenever the underlying data it is based on is changed.
  • 31.
    Speed tables Aspeed table is a denormalized version of a hierarchy. Every parent has a row for every child that reports to it at any level, either directly or indirectly. A speed table optionally carries information such as level within a hierarchy and whether or not the child is at a detail most level within the hierarchy (bottom of tree) Used when tree like hierarchy is to be stored in database. Data is replicated within a speed table to increase the speed of data retrieval.
  • 32.
  • 33.
    CASE STUDY-Prejoin Asimplified retail example Before denormalization: 1 M SALES SALES_DETAIL
  • 34.
    Prejoin Denormalization Asimplified retail example... After denormalization: SALES_AND_DETAILS
  • 35.
    SAMPLE QUERY Q)What was my total volume between '06-AUG-08'and '06-AUG-09'? BEFORE denormalization: select sum(sales_detail.product_qty) from sales ,sales_detail where sales.sale_id = sales_detail.sale_id and sales.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
  • 36.
  • 37.
    Sample Query 2Q) What was my total volume between '06-AUG-08'and '06-AUG-09'? AFTER denormalization: select sum(product_qty) from sales_and_details where sales_and_details.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
  • 38.
  • 39.
    Sample Query 3What happens if we ask about the number of “sales” rather than the quantity transacted? BEFORE denormalization: select count(*) from sales where sales.sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
  • 40.
  • 41.
    What happens ifwe ask about the number of “sales” rather than the quantity transacted? AFTER denormalization: select count(distinct sale_id) from sales_and_details where sales_and_details .sale_date between TO_DATE('06-AUG-08','DD-Month-YY') and TO_DATE('06-AUG-09','DD-Month-YY');
  • 42.
  • 43.
    PROS Convenience Usingcalculated values it is far easier for programmers to generate reports without have generating code to calculate them. Saves CPU time. Simple Queries Each eliminated JOIN is a simpler query that is easier to get right the first time, easier to debug, and easier to keep correct when changed.
  • 44.
    PROS The PerformanceArgument We end up improving performance (speed) because we need fewer JOINs to retrieve the same number of facts. The Storage Argument Data availability the locations where it will be used. The number of foreign keys are reduced (how separate tables are related), the number of indexes are reduced (foreign keys are frequently indexed)
  • 45.
    CONS Leads todata duplication and increases the storage requirement of the database. Documenting decisions, ensuring valid data, data migration. Having multiple copies leads to synchronization issues. Increased update time.
  • 46.
    Physically speaking.,.,, Performancedetermined entirely at the “physical database level “ Storage and access methods Hardware Physical design DBMS implementation details Degree of concurrent access
  • 47.
    AN ILLUSION Indenormalization have an understanding that: 1. Higher the normalization, greater the number of tables 2. Greater number of tables require more joins 3. Joins slow performance 4. Denormalization reduces number of tables and, hence less joins, improved performance. The problem is that points 2 and 3 are not necessarily true, in which case point 4 does not hold and even if they hold true.
  • 48.
    It is claimedthat from the integrity perspective, there are two database design options: Fully normalize the database thereby maximizing the simplicity of integrity enforcement; Denormalize the database and complicate integrity enforcement. According to the illusion argument, the first choice is the better option. Why, then, the prevailing insistence on the second choice? The argument for denormalization is, of course, based on performance considerations
  • 49.
    Conclusion In areal-life project, you have to bring back some data redundancy for performance reasons. Database design is about efficient data engineering - tradeoffs in design choices , choosing the right design for the performance requirements As stated by most database practitioners, denormalization may or may not result in a better performance or a more flexible data structure for users. Selective denormalization is usually required. Weigh and decide whether the perceived benefits are worth the effort to maintain the database properly. The importance of the present argument between its pros and cons is of a vital importance
  • 50.
    References [1] G.Lawrence Sanders & Seung kyoon Shin , Denormalization Effects on Performance of RDBMS, Proceedings of the 34th Hawaii International Conference on System Sciences, 2001. [2] Denormalization strategies for data retrieval from data warehouses, Seung Kyoon Shina,*, G. Lawrence Sandersb,1a [3] Marsha Hanus, To Normalize or Denormalize, That is the Question, Candle Corporation [4] Denormalization Guidelines by Craig S. Mullins Published: PLATINUM technology, inc. June 1, 1997 [5] Douglas B. Bock and John F. Schrage, Department of Computer Management and Information Systems, Southern Illinois University Edwardsville, published in the 1996 Proceedings of the Decision Sciences Institute, Orlando, Florida, November, 1996 [6] The Dangerous Illusion: Denormalization, Performance and Integrity, Part 1 and Part 2, - Fabian Pascal , DM Review Magazine, July 2002 [7] Service-Oriented Data Denormalization for Scalable Web Applications, Zhou Wei (Tsinghua University Beijing, China), Jiang Dejun (Tsinghua University), Guillaume Pierre (Vrije Universiteit Amsterdam), Chi-Hung Chi (Tsinghua Univers), Maarten van Steen(Vrije Universiteit Amsterdam);April 21-25, 2008. Beijing, China [8] Understanding Normalisation, by Micheal J Hernandez, 2001-2003. [9] Hierarchical Denormalizing: A Possibility to Optimize the Data Warehouse Design By Morteza Zaker, Somnuk Phon-Amnuaisuk, Su-Cheng Haw [10] How Valuable is Planned Data Redundancy in Maintaining the Integrity of an Information System through its Database by Eghosa Ugboma , Florida Memorial University [11 ] Introduction to Databases, Database Design and SQL, Zornitsa Zaharieva, CERN [12] THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
  • 51.
  • 52.
    Anomalies Anomalies areinconsistencies in data that occur due to unnecessary redundancy. Update anomaly Some copies of a data item are updated, but others are not. Insertion anomaly Can’t insert “real” data without also inserting unrelated or “made up” data Deletion anomaly Can’t delete some data without also deleting other, unrelated data
  • 53.
    First Normal Form(1NF) If a table of data meets the definition of a relation, it is in first normal form. Every relation has a unique name. Every attribute value is atomic (single-valued). Every row is unique. Attributes in tables have unique names. The order of the columns is irrelevant. The order of the rows is irrelevant.
  • 54.
    Second Normal Form(2NF) 1NF and no partial functional dependencies. Partial functional dependency : when one or more non-key attributes are functionally dependent on part of the primary key. Every non-key attribute must be defined by the entire key, not just by part of the key. If a relation has a single attribute as its key, then it is automatically in 2NF.
  • 55.
    Second normal form2NF A relation that is not in 2NF Key: Student_ID, Activity Activity  Fee Fee is determined by Activity ACTIVITY Student_ID Activity Fee
  • 56.
    Fee Divide therelation into two relations that now meet 2NF Student_ID STUDENT_ACTIVITY Activity ACTIVITY_COST Activity Key: Student_ID and Activity Key: Activity Activity  Fee
  • 57.
    Third Normal Form(3NF) 2NF and no transitive dependencies Transitive dependency : a functional dependency between two or more non-key attributes.
  • 58.
    A relation witha transitive dependency Student_ID HOUSING Building Fee Key: Student_ID Building  Fee Student_ID  Building  Fee
  • 59.
    Divide the relationinto two relations that now meet 3NF Student_ID STUDENT_HOUSING Building Key: Student_ID Student_ID  Building BUILDING_COST Building Fee Key: Building Building  Fee
  • 60.
    Third Normal Form(3NF) In 2NF and every non-key column is mutually independent means : Calculations Solution: Put calculations in queries and forms OrderDetails OrderID Item Quantity Price Put expression in text control or in query: =Quantity * Price Item Quantity Price Total Hammer 2 $10 $20 Saw 5 $40 $200 Nails 8 $1 $8
  • 61.
    BCNF 3NF andevery determinant is a candidate key.
  • 62.
    A relation wherea determinant is not a candidate key Note: Students can have a double major and have an advisor for each major. An advisor works only with students in their assigned area. Student_ID STUDENT_ADVISOR Advisor Major Primary Key: Student_ID, Major Candidate Key: Student_ID, Advisor Advisor  Major
  • 63.
    Divide the relationinto two relations that meet BCNF Student_ID STUDENT_ADVISOR Key: Student_ID, Advisor ADVISOR_MAJOR Advisor Major Advisor Key: Advisor Advisor  Major
  • 64.
    Speed Tables
  • 65.