SlideShare a Scribd company logo
RDBMS
Denormalization
Benefits and Pitfalls
Hello!
I’m Shyam Anand.
In the software industry for over 10 years.
Currently Software Architect at Turvo Inc.
Previously have headed engineering for a couple of startups.
mail@shyam-anand.com | linkedin.com/in/shyamanand
Introduction
A practical view of denormalization
- When to denormalize
- What strategies can be used
- Considerations before denormalizing
Denormalization can enhance query performance when it is deployed with a
complete understanding of application requirements.
Normalization
Optimize for Data Capture
Process of grouping attributes into
refined structures
In accordance with a series of
“normal forms”
To reduce redundancy and improve
data integrity
Objectives of Normalization
1. To free the collection of relations from undesirable insertion, update and
deletion dependencies.
2. To reduce the need for restructuring the collection of relations, as new types
of data are introduced, and thus increase the lifespan of application
programs.
3. To make the relational model more informative to users.
4. To make the collection of relations neutral to the query statistics, where
these statistics are liable to change as time goes by.
~ Edgar F. Codd, “Further Normalization of the Data Base Relational Model”
Objectives of Normalization
Prevent Insertion, Update, and Deletion anomalies
Minimize redesign when extending the database structure
- A fully normalized database allows its structure to be extended to accommodate new
types of data without changing existing structure too much.
- As a result, applications interacting with the database are minimally affected.
First Normal Form (1NF)
- Separate table for each set of related attributes
- Each field is atomic
Student ID Student Name Subjects
100 Alice Databases, Programming
Student ID Student Name
100 Alice
Subject ID Student ID Subject
1 100 Databases
2 100 Programming
Second Normal Form (2NF)
- Satisfies 1NF
- Every non-prime attribute is dependant on the whole of every candidate key.
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
Third Normal Form (3NF)
- Satisfies 2NF
- All the attributes are functionally dependant on solely the primary key.
- Repeating values are not dependant on a primary key
A database relation is described as “normalized” if it meets 3NF.
Most 3NF relations are free of insertion, update, and deletion anomalies.
Third Normal Form (3NF)
Manufacturer Model Country
Maruti Brezza India
Maruti Baleno India
Kia Seltos S. Korea
Kia Sonnet S. Korea
Manufacturer Country
Maruti India
Kia S. Korea
Manufacturer Model
Maruti Brezza
Maruti Baleno
Kia Seltos
Kia Sonnet
Other Normal Forms
- Boyce/Codd Normal Form (BCNF)
- Elementary Key Normal Form (EKNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Essential Tuple Normal Form (ETNF)
- Domain-Key Normal Form (DKNF)
- Six Normal Form (6NF)
Mostly academic, not widely implemented
Drawbacks
Poor System Performance
A full normalization results in a number of logically separate entities that, in turn,
result in even more physically separate stored files. The net effect is that join
processing against normalized tables requires an additional amount of system
resources.
May also cause significant inefficiencies when there are few updates and many
query retrievals involving a large number of join operations
Denormalization
Optimize for Data Access
Process of reducing the degree of
normalization
By adding redundant copies of data
or by grouping data
To improve query performance
Objectives of Denormalization
Improve the read performance of a database.
More intuitive data structure for data warehousing.
Put enterprise data at the disposal of organizational decision makers.
Often motivated by performance or scalability in relational database software
needing to carry out very large numbers of read operations.
Benefits of Denormalization
Reduces the number of physical tables that must be accessed to retrieve the
data by reducing the number of joins needed.
Provides better performance and a more intuitive data structure for users to
navigate.
Useful in data warehousing implementations for data mining.
Denormalization Strategies
Collapsing Tables
Splitting Tables (horizontal/vertical)
Adding Redundant Columns (Reference Data)
Derived Attributes (Summary, Total, Balance)
Snowflake and Star Schemas
Fact tables connected to multiple dimensions.
Snowflake schema has dimensions normalized.
Star schema dimensions are denormalized, with each dimension represented by
a single table.
Snowflake for better data integrity, and Star for better performance.
Performance at a Cost
Denormalization decisions usually involve the trade-offs between flexibility and performance.
It is the database designer's responsibility to ensure that the denormalized database does not become
inconsistent.
This is done by creating Constraints, that specify how the redundant copies of information must be kept
synchronized, which may easily make the de-normalization procedure pointless.
The increase in logical complexity of the database design and the added complexity of the additional
constraints make this approach hazardous.
Constraints introduce a trade-off, speeding up reads while slowing down writes.
This means a denormalized database under heavy write load may offer worse performance than its
functionally equivalent normalized counterpart.
Drawbacks
Data duplication
More complex data-integrity rules
Update anomalies
Increased difficulty in expressing the type of access
Addressing Drawbacks
Update anomalies can be generally resolved by using Triggers, application logic,
and batch reconciliation.
Triggers, provide the best solution from an integrity point of view, but can be
costly in terms of performance.
Application logic can update denormalized data to ensure that changes are
atomic, but this is risky, because the same logic must be used and maintained in
all applications that modify the data.
Batch reconciliation can be run at intervals to bring the data into agreement, but
it can affect system performance.
A Denormalization Process Model
Primary goals are to improve query performance and present a less complex and
more user-oriented view of data.
Denormalization should be only considered when performance is an issue, and
only after there has been a thorough analysis of the various impacted systems.
Data should be first normalized as the design is being conceptualized, and then
denormalized in response to the performance requirements.
Criteria for Denormalization
General application performance requirements indicated by business needs.
Online response time requirements for application queries, updates and
processes.
Minimum number of data access paths.
Minimum amount of storage.
DB Design Cycle with Denormalization
Development of a conceptual data model (ER diagram)
Refinement and Normalization
Identifying candidates for denormalization
Determining the effect of denormalizing entities on data integrity
Identifying what form the denormalized entity may take.
Map conceptual scheme to physical scheme
When Considering Denormalization
Analysis of the advantages and disadvantages of possible implementations is
needed.
It may not be possible to accomplish a full denormalization that meets all
specified criteria.
The database designer should evaluate the degree of importance of each
criterion.
Other Considerations of Denormalization
Application performance criteria.
Future application development and
maintenance considerations.
Volatility of application requirements.
Relations between transactions and relations of
entities involved.
Transaction type (update/query, OLTP/OLAP).
Transaction frequency.
Access paths needed by each transaction.
Number of rows accessed by each transaction.
Number of pages/blocks accessed by each
transaction.
Cardinality of each relation
When in doubt, don’t denormalize
Thank you!

More Related Content

What's hot

Database Management System
Database Management SystemDatabase Management System
Database Management System
Nishant Munjal
 
Entity Relationship Diagrams
Entity Relationship DiagramsEntity Relationship Diagrams
Entity Relationship Diagrams
sadique_ghitm
 
UML
UMLUML
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
Trinath
 
Introduction to UML
Introduction to UMLIntroduction to UML
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
DBMS
DBMSDBMS
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architectureNguyễn Ngân
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
Kavisha Uniyal
 
Adbms 17 object query language
Adbms 17 object query languageAdbms 17 object query language
Adbms 17 object query language
Vaibhav Khanna
 
Sql Tutorials
Sql TutorialsSql Tutorials
Sql Tutorials
Priyabrat Kar
 
Fundamentals of Database ppt ch04
Fundamentals of Database ppt ch04Fundamentals of Database ppt ch04
Fundamentals of Database ppt ch04Jotham Gadot
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
sathish sak
 
Data Flow Diagram
Data Flow DiagramData Flow Diagram
Data Flow Diagram
nethisip13
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMSkoolkampus
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMSkoolkampus
 
Database systems introduction
Database systems introductionDatabase systems introduction
Database systems introduction
Balasingham Karthiban
 

What's hot (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Entity Relationship Diagrams
Entity Relationship DiagramsEntity Relationship Diagrams
Entity Relationship Diagrams
 
UML
UMLUML
UML
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Introduction to UML
Introduction to UMLIntroduction to UML
Introduction to UML
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
DBMS
DBMSDBMS
DBMS
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architecture
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Adbms 17 object query language
Adbms 17 object query languageAdbms 17 object query language
Adbms 17 object query language
 
Sql Tutorials
Sql TutorialsSql Tutorials
Sql Tutorials
 
Fundamentals of Database ppt ch04
Fundamentals of Database ppt ch04Fundamentals of Database ppt ch04
Fundamentals of Database ppt ch04
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
Data Flow Diagram
Data Flow DiagramData Flow Diagram
Data Flow Diagram
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMS
 
Database systems introduction
Database systems introductionDatabase systems introduction
Database systems introduction
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 

Similar to RDBMS Denormalization - Benefits & Pitfalls

IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET Journal
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
JOHNLEAK1
 
Sql good practices
Sql good practicesSql good practices
Sql good practices
Deepak Mehtani
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
Sammy Alvarez
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Usman Tariq
 
Data models
Data modelsData models
Data models
Usman Tariq
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of DenormalizationAliya Saldanha
 
Requirements and Traceability With Pictures
Requirements and Traceability With PicturesRequirements and Traceability With Pictures
Requirements and Traceability With Pictures
Leslie Munday
 
ADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdfADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdf
19BAG7124SAHIL
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
TechoERP
 
How not to Model Data
How not to Model DataHow not to Model Data
How not to Model Data
Gurzu Inc
 
How not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptxHow not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptx
GurzuInc
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
SAP Overview and Architecture
SAP Overview and ArchitectureSAP Overview and Architecture
SAP Overview and Architecture
Ankit Sharma
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheet
GlobalSoftUSA
 
Business Impacts on SAP Deployments
Business Impacts on SAP DeploymentsBusiness Impacts on SAP Deployments
Business Impacts on SAP Deployments
IBM India Smarter Computing
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityJonathan Hamilton Solórzano
 
CHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docxCHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docx
RUKIAHASSAN4
 
09 mdm tool comaprison
09 mdm tool comaprison09 mdm tool comaprison
09 mdm tool comaprison
Sneha Kulkarni
 

Similar to RDBMS Denormalization - Benefits & Pitfalls (20)

IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
 
Sql good practices
Sql good practicesSql good practices
Sql good practices
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
 
Data models
Data modelsData models
Data models
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
 
Requirements and Traceability With Pictures
Requirements and Traceability With PicturesRequirements and Traceability With Pictures
Requirements and Traceability With Pictures
 
ADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdfADBMS 19MCA8125.pdf
ADBMS 19MCA8125.pdf
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
How not to Model Data
How not to Model DataHow not to Model Data
How not to Model Data
 
How not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptxHow not to Model Data - G1 conference.pptx
How not to Model Data - G1 conference.pptx
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
SAP Overview and Architecture
SAP Overview and ArchitectureSAP Overview and Architecture
SAP Overview and Architecture
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheet
 
Business Impacts on SAP Deployments
Business Impacts on SAP DeploymentsBusiness Impacts on SAP Deployments
Business Impacts on SAP Deployments
 
t2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevityt2_4-architecting-data-for-integration-and-longevity
t2_4-architecting-data-for-integration-and-longevity
 
CHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docxCHAPTER FOUR buugii 2023.docx
CHAPTER FOUR buugii 2023.docx
 
09 mdm tool comaprison
09 mdm tool comaprison09 mdm tool comaprison
09 mdm tool comaprison
 

Recently uploaded

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 

RDBMS Denormalization - Benefits & Pitfalls

  • 2. Hello! I’m Shyam Anand. In the software industry for over 10 years. Currently Software Architect at Turvo Inc. Previously have headed engineering for a couple of startups. mail@shyam-anand.com | linkedin.com/in/shyamanand
  • 3. Introduction A practical view of denormalization - When to denormalize - What strategies can be used - Considerations before denormalizing Denormalization can enhance query performance when it is deployed with a complete understanding of application requirements.
  • 4. Normalization Optimize for Data Capture Process of grouping attributes into refined structures In accordance with a series of “normal forms” To reduce redundancy and improve data integrity
  • 5. Objectives of Normalization 1. To free the collection of relations from undesirable insertion, update and deletion dependencies. 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the lifespan of application programs. 3. To make the relational model more informative to users. 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. ~ Edgar F. Codd, “Further Normalization of the Data Base Relational Model”
  • 6. Objectives of Normalization Prevent Insertion, Update, and Deletion anomalies Minimize redesign when extending the database structure - A fully normalized database allows its structure to be extended to accommodate new types of data without changing existing structure too much. - As a result, applications interacting with the database are minimally affected.
  • 7. First Normal Form (1NF) - Separate table for each set of related attributes - Each field is atomic Student ID Student Name Subjects 100 Alice Databases, Programming Student ID Student Name 100 Alice Subject ID Student ID Subject 1 100 Databases 2 100 Programming
  • 8. Second Normal Form (2NF) - Satisfies 1NF - Every non-prime attribute is dependant on the whole of every candidate key. Manufacturer Model Country Maruti Brezza India Maruti Baleno India Kia Seltos S. Korea Kia Sonnet S. Korea Manufacturer Country Maruti India Kia S. Korea Manufacturer Model Maruti Brezza Maruti Baleno Kia Seltos Kia Sonnet
  • 9. Third Normal Form (3NF) - Satisfies 2NF - All the attributes are functionally dependant on solely the primary key. - Repeating values are not dependant on a primary key A database relation is described as “normalized” if it meets 3NF. Most 3NF relations are free of insertion, update, and deletion anomalies.
  • 10. Third Normal Form (3NF) Manufacturer Model Country Maruti Brezza India Maruti Baleno India Kia Seltos S. Korea Kia Sonnet S. Korea Manufacturer Country Maruti India Kia S. Korea Manufacturer Model Maruti Brezza Maruti Baleno Kia Seltos Kia Sonnet
  • 11. Other Normal Forms - Boyce/Codd Normal Form (BCNF) - Elementary Key Normal Form (EKNF) - Fourth Normal Form (4NF) - Fifth Normal Form (5NF) - Essential Tuple Normal Form (ETNF) - Domain-Key Normal Form (DKNF) - Six Normal Form (6NF) Mostly academic, not widely implemented
  • 12. Drawbacks Poor System Performance A full normalization results in a number of logically separate entities that, in turn, result in even more physically separate stored files. The net effect is that join processing against normalized tables requires an additional amount of system resources. May also cause significant inefficiencies when there are few updates and many query retrievals involving a large number of join operations
  • 13. Denormalization Optimize for Data Access Process of reducing the degree of normalization By adding redundant copies of data or by grouping data To improve query performance
  • 14. Objectives of Denormalization Improve the read performance of a database. More intuitive data structure for data warehousing. Put enterprise data at the disposal of organizational decision makers. Often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations.
  • 15. Benefits of Denormalization Reduces the number of physical tables that must be accessed to retrieve the data by reducing the number of joins needed. Provides better performance and a more intuitive data structure for users to navigate. Useful in data warehousing implementations for data mining.
  • 16. Denormalization Strategies Collapsing Tables Splitting Tables (horizontal/vertical) Adding Redundant Columns (Reference Data) Derived Attributes (Summary, Total, Balance)
  • 17. Snowflake and Star Schemas Fact tables connected to multiple dimensions. Snowflake schema has dimensions normalized. Star schema dimensions are denormalized, with each dimension represented by a single table. Snowflake for better data integrity, and Star for better performance.
  • 18. Performance at a Cost Denormalization decisions usually involve the trade-offs between flexibility and performance. It is the database designer's responsibility to ensure that the denormalized database does not become inconsistent. This is done by creating Constraints, that specify how the redundant copies of information must be kept synchronized, which may easily make the de-normalization procedure pointless. The increase in logical complexity of the database design and the added complexity of the additional constraints make this approach hazardous. Constraints introduce a trade-off, speeding up reads while slowing down writes. This means a denormalized database under heavy write load may offer worse performance than its functionally equivalent normalized counterpart.
  • 19. Drawbacks Data duplication More complex data-integrity rules Update anomalies Increased difficulty in expressing the type of access
  • 20. Addressing Drawbacks Update anomalies can be generally resolved by using Triggers, application logic, and batch reconciliation. Triggers, provide the best solution from an integrity point of view, but can be costly in terms of performance. Application logic can update denormalized data to ensure that changes are atomic, but this is risky, because the same logic must be used and maintained in all applications that modify the data. Batch reconciliation can be run at intervals to bring the data into agreement, but it can affect system performance.
  • 21. A Denormalization Process Model Primary goals are to improve query performance and present a less complex and more user-oriented view of data. Denormalization should be only considered when performance is an issue, and only after there has been a thorough analysis of the various impacted systems. Data should be first normalized as the design is being conceptualized, and then denormalized in response to the performance requirements.
  • 22. Criteria for Denormalization General application performance requirements indicated by business needs. Online response time requirements for application queries, updates and processes. Minimum number of data access paths. Minimum amount of storage.
  • 23. DB Design Cycle with Denormalization Development of a conceptual data model (ER diagram) Refinement and Normalization Identifying candidates for denormalization Determining the effect of denormalizing entities on data integrity Identifying what form the denormalized entity may take. Map conceptual scheme to physical scheme
  • 24. When Considering Denormalization Analysis of the advantages and disadvantages of possible implementations is needed. It may not be possible to accomplish a full denormalization that meets all specified criteria. The database designer should evaluate the degree of importance of each criterion.
  • 25. Other Considerations of Denormalization Application performance criteria. Future application development and maintenance considerations. Volatility of application requirements. Relations between transactions and relations of entities involved. Transaction type (update/query, OLTP/OLAP). Transaction frequency. Access paths needed by each transaction. Number of rows accessed by each transaction. Number of pages/blocks accessed by each transaction. Cardinality of each relation When in doubt, don’t denormalize