A brief overview of denormalization in RDBMS. The slides cover the basics of normalization and then talk about why, when, and how to do denormalization.
2. Hello!
I’m Shyam Anand.
In the software industry for over 10 years.
Currently Software Architect at Turvo Inc.
Previously have headed engineering for a couple of startups.
mail@shyam-anand.com | linkedin.com/in/shyamanand
3. Introduction
A practical view of denormalization
- When to denormalize
- What strategies can be used
- Considerations before denormalizing
Denormalization can enhance query performance when it is deployed with a
complete understanding of application requirements.
4. Normalization
Optimize for Data Capture
Process of grouping attributes into
refined structures
In accordance with a series of
“normal forms”
To reduce redundancy and improve
data integrity
5. Objectives of Normalization
1. To free the collection of relations from undesirable insertion, update and
deletion dependencies.
2. To reduce the need for restructuring the collection of relations, as new types
of data are introduced, and thus increase the lifespan of application
programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where
these statistics are liable to change as time goes by.
~ Edgar F. Codd, “Further Normalization of the Data Base Relational Model”
6. Objectives of Normalization
Prevent Insertion, Update, and Deletion anomalies
Minimize redesign when extending the database structure
- A fully normalized database allows its structure to be extended to accommodate new
types of data without changing existing structure too much.
- As a result, applications interacting with the database are minimally affected.
7. First Normal Form (1NF)
- Separate table for each set of related attributes
- Each field is atomic
Student ID Student Name Subjects
100 Alice Databases, Programming
Student ID Student Name
100 Alice
Subject ID Student ID Subject
1 100 Databases
2 100 Programming
8. Second Normal Form (2NF)
- Satisfies 1NF
- Every non-prime attribute is dependant on the whole of every candidate key.
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
9. Third Normal Form (3NF)
- Satisfies 2NF
- All the attributes are functionally dependant on solely the primary key.
- Repeating values are not dependant on a primary key
A database relation is described as “normalized” if it meets 3NF.
Most 3NF relations are free of insertion, update, and deletion anomalies.
10. Third Normal Form (3NF)
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
11. Other Normal Forms
- Boyce/Codd Normal Form (BCNF)
- Elementary Key Normal Form (EKNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Essential Tuple Normal Form (ETNF)
- Domain-Key Normal Form (DKNF)
- Six Normal Form (6NF)
Mostly academic, not widely implemented
12. Drawbacks
Poor System Performance
A full normalization results in a number of logically separate entities that, in turn,
result in even more physically separate stored files. The net effect is that join
processing against normalized tables requires an additional amount of system
resources.
May also cause significant inefficiencies when there are few updates and many
query retrievals involving a large number of join operations
13. Denormalization
Optimize for Data Access
Process of reducing the degree of
normalization
By adding redundant copies of data
or by grouping data
To improve query performance
14. Objectives of Denormalization
Improve the read performance of a database.
More intuitive data structure for data warehousing.
Put enterprise data at the disposal of organizational decision makers.
Often motivated by performance or scalability in relational database software
needing to carry out very large numbers of read operations.
15. Benefits of Denormalization
Reduces the number of physical tables that must be accessed to retrieve the
data by reducing the number of joins needed.
Provides better performance and a more intuitive data structure for users to
navigate.
Useful in data warehousing implementations for data mining.
17. Snowflake and Star Schemas
Fact tables connected to multiple dimensions.
Snowflake schema has dimensions normalized.
Star schema dimensions are denormalized, with each dimension represented by
a single table.
Snowflake for better data integrity, and Star for better performance.
18. Performance at a Cost
Denormalization decisions usually involve the trade-offs between flexibility and performance.
It is the database designer's responsibility to ensure that the denormalized database does not become
inconsistent.
This is done by creating Constraints, that specify how the redundant copies of information must be kept
synchronized, which may easily make the de-normalization procedure pointless.
The increase in logical complexity of the database design and the added complexity of the additional
constraints make this approach hazardous.
Constraints introduce a trade-off, speeding up reads while slowing down writes.
This means a denormalized database under heavy write load may offer worse performance than its
functionally equivalent normalized counterpart.
20. Addressing Drawbacks
Update anomalies can be generally resolved by using Triggers, application logic,
and batch reconciliation.
Triggers, provide the best solution from an integrity point of view, but can be
costly in terms of performance.
Application logic can update denormalized data to ensure that changes are
atomic, but this is risky, because the same logic must be used and maintained in
all applications that modify the data.
Batch reconciliation can be run at intervals to bring the data into agreement, but
it can affect system performance.
21. A Denormalization Process Model
Primary goals are to improve query performance and present a less complex and
more user-oriented view of data.
Denormalization should be only considered when performance is an issue, and
only after there has been a thorough analysis of the various impacted systems.
Data should be first normalized as the design is being conceptualized, and then
denormalized in response to the performance requirements.
22. Criteria for Denormalization
General application performance requirements indicated by business needs.
Online response time requirements for application queries, updates and
processes.
Minimum number of data access paths.
Minimum amount of storage.
23. DB Design Cycle with Denormalization
Development of a conceptual data model (ER diagram)
Refinement and Normalization
Identifying candidates for denormalization
Determining the effect of denormalizing entities on data integrity
Identifying what form the denormalized entity may take.
Map conceptual scheme to physical scheme
24. When Considering Denormalization
Analysis of the advantages and disadvantages of possible implementations is
needed.
It may not be possible to accomplish a full denormalization that meets all
specified criteria.
The database designer should evaluate the degree of importance of each
criterion.
25. Other Considerations of Denormalization
Application performance criteria.
Future application development and
maintenance considerations.
Volatility of application requirements.
Relations between transactions and relations of
entities involved.
Transaction type (update/query, OLTP/OLAP).
Transaction frequency.
Access paths needed by each transaction.
Number of rows accessed by each transaction.
Number of pages/blocks accessed by each
transaction.
Cardinality of each relation
When in doubt, don’t denormalize