SlideShare a Scribd company logo
1 of 26
Download to read offline
RDBMS
Denormalization
Benefits and Pitfalls
Hello!
I’m Shyam Anand.
In the software industry for over 10 years.
Currently Software Architect at Turvo Inc.
Previously have headed engineering for a couple of startups.
mail@shyam-anand.com | linkedin.com/in/shyamanand
Introduction
A practical view of denormalization
- When to denormalize
- What strategies can be used
- Considerations before denormalizing
Denormalization can enhance query performance when it is deployed with a
complete understanding of application requirements.
Normalization
Optimize for Data Capture
Process of grouping attributes into
refined structures
In accordance with a series of
“normal forms”
To reduce redundancy and improve
data integrity
Objectives of Normalization
1. To free the collection of relations from undesirable insertion, update and
deletion dependencies.
2. To reduce the need for restructuring the collection of relations, as new types
of data are introduced, and thus increase the lifespan of application
programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where
these statistics are liable to change as time goes by.
~ Edgar F. Codd, “Further Normalization of the Data Base Relational Model”
Objectives of Normalization
Prevent Insertion, Update, and Deletion anomalies
Minimize redesign when extending the database structure
- A fully normalized database allows its structure to be extended to accommodate new
types of data without changing existing structure too much.
- As a result, applications interacting with the database are minimally affected.
First Normal Form (1NF)
- Separate table for each set of related attributes
- Each field is atomic
Student ID Student Name Subjects
100 Alice Databases, Programming
Student ID Student Name
100 Alice
Subject ID Student ID Subject
1 100 Databases
2 100 Programming
Second Normal Form (2NF)
- Satisfies 1NF
- Every non-prime attribute is dependant on the whole of every candidate key.
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
Third Normal Form (3NF)
- Satisfies 2NF
- All the attributes are functionally dependant on solely the primary key.
- Repeating values are not dependant on a primary key
A database relation is described as “normalized” if it meets 3NF.
Most 3NF relations are free of insertion, update, and deletion anomalies.
Third Normal Form (3NF)
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
Other Normal Forms
- Boyce/Codd Normal Form (BCNF)
- Elementary Key Normal Form (EKNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Essential Tuple Normal Form (ETNF)
- Domain-Key Normal Form (DKNF)
- Six Normal Form (6NF)
Mostly academic, not widely implemented
Drawbacks
Poor System Performance
A full normalization results in a number of logically separate entities that, in turn,
result in even more physically separate stored files. The net effect is that join
processing against normalized tables requires an additional amount of system
resources.
May also cause significant inefficiencies when there are few updates and many
query retrievals involving a large number of join operations
Denormalization
Optimize for Data Access
Process of reducing the degree of
normalization
By adding redundant copies of data
or by grouping data
To improve query performance
Objectives of Denormalization
Improve the read performance of a database.
More intuitive data structure for data warehousing.
Put enterprise data at the disposal of organizational decision makers.
Often motivated by performance or scalability in relational database software
needing to carry out very large numbers of read operations.
Benefits of Denormalization
Reduces the number of physical tables that must be accessed to retrieve the
data by reducing the number of joins needed.
Provides better performance and a more intuitive data structure for users to
navigate.
Useful in data warehousing implementations for data mining.
Denormalization Strategies
Collapsing Tables
Splitting Tables (horizontal/vertical)
Adding Redundant Columns (Reference Data)
Derived Attributes (Summary, Total, Balance)
Snowflake and Star Schemas
Fact tables connected to multiple dimensions.
Snowflake schema has dimensions normalized.
Star schema dimensions are denormalized, with each dimension represented by
a single table.
Snowflake for better data integrity, and Star for better performance.
Performance at a Cost
Denormalization decisions usually involve the trade-offs between flexibility and performance.
It is the database designer's responsibility to ensure that the denormalized database does not become
inconsistent.
This is done by creating Constraints, that specify how the redundant copies of information must be kept
synchronized, which may easily make the de-normalization procedure pointless.
The increase in logical complexity of the database design and the added complexity of the additional
constraints make this approach hazardous.
Constraints introduce a trade-off, speeding up reads while slowing down writes.
This means a denormalized database under heavy write load may offer worse performance than its
functionally equivalent normalized counterpart.
Drawbacks
Data duplication
More complex data-integrity rules
Update anomalies
Increased difficulty in expressing the type of access
Addressing Drawbacks
Update anomalies can be generally resolved by using Triggers, application logic,
and batch reconciliation.
Triggers, provide the best solution from an integrity point of view, but can be
costly in terms of performance.
Application logic can update denormalized data to ensure that changes are
atomic, but this is risky, because the same logic must be used and maintained in
all applications that modify the data.
Batch reconciliation can be run at intervals to bring the data into agreement, but
it can affect system performance.
A Denormalization Process Model
Primary goals are to improve query performance and present a less complex and
more user-oriented view of data.
Denormalization should be only considered when performance is an issue, and
only after there has been a thorough analysis of the various impacted systems.
Data should be first normalized as the design is being conceptualized, and then
denormalized in response to the performance requirements.
Criteria for Denormalization
General application performance requirements indicated by business needs.
Online response time requirements for application queries, updates and
processes.
Minimum number of data access paths.
Minimum amount of storage.
DB Design Cycle with Denormalization
Development of a conceptual data model (ER diagram)
Refinement and Normalization
Identifying candidates for denormalization
Determining the effect of denormalizing entities on data integrity
Identifying what form the denormalized entity may take.
Map conceptual scheme to physical scheme
When Considering Denormalization
Analysis of the advantages and disadvantages of possible implementations is
needed.
It may not be possible to accomplish a full denormalization that meets all
specified criteria.
The database designer should evaluate the degree of importance of each
criterion.
Other Considerations of Denormalization
Application performance criteria.
Future application development and
maintenance considerations.
Volatility of application requirements.
Relations between transactions and relations of
entities involved.
Transaction type (update/query, OLTP/OLAP).
Transaction frequency.
Access paths needed by each transaction.
Number of rows accessed by each transaction.
Number of pages/blocks accessed by each
transaction.
Cardinality of each relation
When in doubt, don’t denormalize
Thank you!

More Related Content

What's hot

Virtualization security threats in cloud computing
Virtualization security threats in cloud computingVirtualization security threats in cloud computing
Virtualization security threats in cloud computingNitish Awasthi (anitish_225)
 
Fast Start Failover DataGuard
Fast Start Failover DataGuardFast Start Failover DataGuard
Fast Start Failover DataGuardBorsaniya Vaibhav
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMSMegha Patel
 
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4sakaik
 
[Postgre sql9.4新機能]レプリケーション・スロットの活用
[Postgre sql9.4新機能]レプリケーション・スロットの活用[Postgre sql9.4新機能]レプリケーション・スロットの活用
[Postgre sql9.4新機能]レプリケーション・スロットの活用Kosuke Kida
 
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...Insight Technology, Inc.
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by steplaonap166
 
Distributed Server
Distributed ServerDistributed Server
Distributed ServerRajan Kumar
 
Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioRupak Roy
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Ludovico Caldara
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle MultitenantJitendra Singh
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Dataguard presentation
Dataguard presentationDataguard presentation
Dataguard presentationVimlendu Kumar
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1AjayRawat971036
 
Transactions in dbms
Transactions in dbmsTransactions in dbms
Transactions in dbmsNancy Gulati
 
Oracle User Management
Oracle User ManagementOracle User Management
Oracle User ManagementArun Sharma
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management systememailharmeet
 

What's hot (20)

Virtualization security threats in cloud computing
Virtualization security threats in cloud computingVirtualization security threats in cloud computing
Virtualization security threats in cloud computing
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Fast Start Failover DataGuard
Fast Start Failover DataGuardFast Start Failover DataGuard
Fast Start Failover DataGuard
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMS
 
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4
周辺知識から理解するMySQL の GIS機能 ~ClubMySQL #4
 
[Postgre sql9.4新機能]レプリケーション・スロットの活用
[Postgre sql9.4新機能]レプリケーション・スロットの活用[Postgre sql9.4新機能]レプリケーション・スロットの活用
[Postgre sql9.4新機能]レプリケーション・スロットの活用
 
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...
[db tech showcase Tokyo 2016] B27:SQL Server 2016 AlwaysOn 可用性グループ New Featur...
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by step
 
Distributed Server
Distributed ServerDistributed Server
Distributed Server
 
Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R Studio
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?
 
Caching
CachingCaching
Caching
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Dataguard presentation
Dataguard presentationDataguard presentation
Dataguard presentation
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
 
Transactions in dbms
Transactions in dbmsTransactions in dbms
Transactions in dbms
 
Oracle User Management
Oracle User ManagementOracle User Management
Oracle User Management
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 

Similar to RDBMS Denormalization - Benefits & Pitfalls

IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET Journal
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banksSammy Alvarez
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of DenormalizationAliya Saldanha
 
Requirements and Traceability With Pictures
Requirements and Traceability With PicturesRequirements and Traceability With Pictures
Requirements and Traceability With PicturesLeslie Munday
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdfTechoERP
 
How not to Model Data
How not to Model DataHow not to Model Data
How not to Model DataGurzu Inc
 
How not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptxHow not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptxGurzuInc
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
SAP Overview and Architecture
SAP Overview and ArchitectureSAP Overview and Architecture
SAP Overview and Architecture Ankit Sharma
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheetGlobalSoftUSA
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityJonathan Hamilton Solórzano
 
CHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docxCHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docxRUKIAHASSAN4
 
09 mdm tool comaprison
09 mdm tool comaprison09 mdm tool comaprison
09 mdm tool comaprisonSneha Kulkarni
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramDATAVERSITY
 

Similar to RDBMS Denormalization - Benefits & Pitfalls (20)

IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
 
Sql good practices
Sql good practicesSql good practices
Sql good practices
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
 
Requirements and Traceability With Pictures
Requirements and Traceability With PicturesRequirements and Traceability With Pictures
Requirements and Traceability With Pictures
 
ADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdfADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdf
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
How not to Model Data
How not to Model DataHow not to Model Data
How not to Model Data
 
How not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptxHow not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptx
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
SAP Overview and Architecture
SAP Overview and ArchitectureSAP Overview and Architecture
SAP Overview and Architecture
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheet
 
Business Impacts on SAP Deployments
Business Impacts on SAP DeploymentsBusiness Impacts on SAP Deployments
Business Impacts on SAP Deployments
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevity
 
CHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docxCHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docx
 
09 mdm tool comaprison
09 mdm tool comaprison09 mdm tool comaprison
09 mdm tool comaprison
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance Program
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

RDBMS Denormalization - Benefits & Pitfalls

  • 2. Hello! I’m Shyam Anand. In the software industry for over 10 years. Currently Software Architect at Turvo Inc. Previously have headed engineering for a couple of startups. mail@shyam-anand.com | linkedin.com/in/shyamanand
  • 3. Introduction A practical view of denormalization - When to denormalize - What strategies can be used - Considerations before denormalizing Denormalization can enhance query performance when it is deployed with a complete understanding of application requirements.
  • 4. Normalization Optimize for Data Capture Process of grouping attributes into refined structures In accordance with a series of “normal forms” To reduce redundancy and improve data integrity
  • 5. Objectives of Normalization 1. To free the collection of relations from undesirable insertion, update and deletion dependencies. 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the lifespan of application programs. 3. To make the relational model more informative to users. 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. ~ Edgar F. Codd, “Further Normalization of the Data Base Relational Model”
  • 6. Objectives of Normalization Prevent Insertion, Update, and Deletion anomalies Minimize redesign when extending the database structure - A fully normalized database allows its structure to be extended to accommodate new types of data without changing existing structure too much. - As a result, applications interacting with the database are minimally affected.
  • 7. First Normal Form (1NF) - Separate table for each set of related attributes - Each field is atomic Student ID Student Name Subjects 100 Alice Databases, Programming Student ID Student Name 100 Alice Subject ID Student ID Subject 1 100 Databases 2 100 Programming
  • 8. Second Normal Form (2NF) - Satisfies 1NF - Every non-prime attribute is dependant on the whole of every candidate key. Manufacturer Model Country Maruti Brezza India Maruti Baleno India Kia Seltos S. Korea Kia Sonnet S. Korea Manufacturer Country Maruti India Kia S. Korea Manufacturer Model Maruti Brezza Maruti Baleno Kia Seltos Kia Sonnet
  • 9. Third Normal Form (3NF) - Satisfies 2NF - All the attributes are functionally dependant on solely the primary key. - Repeating values are not dependant on a primary key A database relation is described as “normalized” if it meets 3NF. Most 3NF relations are free of insertion, update, and deletion anomalies.
  • 10. Third Normal Form (3NF) Manufacturer Model Country Maruti Brezza India Maruti Baleno India Kia Seltos S. Korea Kia Sonnet S. Korea Manufacturer Country Maruti India Kia S. Korea Manufacturer Model Maruti Brezza Maruti Baleno Kia Seltos Kia Sonnet
  • 11. Other Normal Forms - Boyce/Codd Normal Form (BCNF) - Elementary Key Normal Form (EKNF) - Fourth Normal Form (4NF) - Fifth Normal Form (5NF) - Essential Tuple Normal Form (ETNF) - Domain-Key Normal Form (DKNF) - Six Normal Form (6NF) Mostly academic, not widely implemented
  • 12. Drawbacks Poor System Performance A full normalization results in a number of logically separate entities that, in turn, result in even more physically separate stored files. The net effect is that join processing against normalized tables requires an additional amount of system resources. May also cause significant inefficiencies when there are few updates and many query retrievals involving a large number of join operations
  • 13. Denormalization Optimize for Data Access Process of reducing the degree of normalization By adding redundant copies of data or by grouping data To improve query performance
  • 14. Objectives of Denormalization Improve the read performance of a database. More intuitive data structure for data warehousing. Put enterprise data at the disposal of organizational decision makers. Often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations.
  • 15. Benefits of Denormalization Reduces the number of physical tables that must be accessed to retrieve the data by reducing the number of joins needed. Provides better performance and a more intuitive data structure for users to navigate. Useful in data warehousing implementations for data mining.
  • 16. Denormalization Strategies Collapsing Tables Splitting Tables (horizontal/vertical) Adding Redundant Columns (Reference Data) Derived Attributes (Summary, Total, Balance)
  • 17. Snowflake and Star Schemas Fact tables connected to multiple dimensions. Snowflake schema has dimensions normalized. Star schema dimensions are denormalized, with each dimension represented by a single table. Snowflake for better data integrity, and Star for better performance.
  • 18. Performance at a Cost Denormalization decisions usually involve the trade-offs between flexibility and performance. It is the database designer's responsibility to ensure that the denormalized database does not become inconsistent. This is done by creating Constraints, that specify how the redundant copies of information must be kept synchronized, which may easily make the de-normalization procedure pointless. The increase in logical complexity of the database design and the added complexity of the additional constraints make this approach hazardous. Constraints introduce a trade-off, speeding up reads while slowing down writes. This means a denormalized database under heavy write load may offer worse performance than its functionally equivalent normalized counterpart.
  • 19. Drawbacks Data duplication More complex data-integrity rules Update anomalies Increased difficulty in expressing the type of access
  • 20. Addressing Drawbacks Update anomalies can be generally resolved by using Triggers, application logic, and batch reconciliation. Triggers, provide the best solution from an integrity point of view, but can be costly in terms of performance. Application logic can update denormalized data to ensure that changes are atomic, but this is risky, because the same logic must be used and maintained in all applications that modify the data. Batch reconciliation can be run at intervals to bring the data into agreement, but it can affect system performance.
  • 21. A Denormalization Process Model Primary goals are to improve query performance and present a less complex and more user-oriented view of data. Denormalization should be only considered when performance is an issue, and only after there has been a thorough analysis of the various impacted systems. Data should be first normalized as the design is being conceptualized, and then denormalized in response to the performance requirements.
  • 22. Criteria for Denormalization General application performance requirements indicated by business needs. Online response time requirements for application queries, updates and processes. Minimum number of data access paths. Minimum amount of storage.
  • 23. DB Design Cycle with Denormalization Development of a conceptual data model (ER diagram) Refinement and Normalization Identifying candidates for denormalization Determining the effect of denormalizing entities on data integrity Identifying what form the denormalized entity may take. Map conceptual scheme to physical scheme
  • 24. When Considering Denormalization Analysis of the advantages and disadvantages of possible implementations is needed. It may not be possible to accomplish a full denormalization that meets all specified criteria. The database designer should evaluate the degree of importance of each criterion.
  • 25. Other Considerations of Denormalization Application performance criteria. Future application development and maintenance considerations. Volatility of application requirements. Relations between transactions and relations of entities involved. Transaction type (update/query, OLTP/OLAP). Transaction frequency. Access paths needed by each transaction. Number of rows accessed by each transaction. Number of pages/blocks accessed by each transaction. Cardinality of each relation When in doubt, don’t denormalize