SlideShare a Scribd company logo
1 of 18
Ahsan AbdullahAhsan Abdullah
11
Data WarehousingData Warehousing
Lecture-9Lecture-9
Issues of De-normalizationIssues of De-normalization
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan@cluxing.com
Ahsan Abdullah
2
Issues of De-normalizationIssues of De-normalization
Ahsan Abdullah
3
Why Issues?Why Issues?
Ahsan Abdullah
4
Issues of DenormalizationIssues of Denormalization
 StorageStorage
 PerformancePerformance
 Ease-of-useEase-of-use
 MaintenanceMaintenance
Ahsan Abdullah
5
Industry CharacteristicsIndustry Characteristics
Master:Detail RatiosMaster:Detail Ratios
 Health care 1:2 ratioHealth care 1:2 ratio
 Video Rental 1:3 ratioVideo Rental 1:3 ratio
 Retail 1:30 ratioRetail 1:30 ratio
Ahsan Abdullah
6
Storage Issues: Pre-joining FactsStorage Issues: Pre-joining Facts
 Assume 1:2 record count ratio between claimAssume 1:2 record count ratio between claim
master and detail for health-care application.master and detail for health-care application.
 Assume 10 million members (20 millionAssume 10 million members (20 million
records in claim detail).records in claim detail).
 Assume 10 byte member_ID.Assume 10 byte member_ID.
 Assume 40 byte header for master and 60Assume 40 byte header for master and 60
byte header for detail tables.byte header for detail tables.
Ahsan Abdullah
7
With normalization:With normalization:
Total space used = 10 x 40 + 20 x 60 = 1.6 GBTotal space used = 10 x 40 + 20 x 60 = 1.6 GB
After denormalization:After denormalization:
Total space used = (60 + 40 – 10) x 20 = 1.8 GBTotal space used = (60 + 40 – 10) x 20 = 1.8 GB
Net result is 12.5% additional space requiredNet result is 12.5% additional space required
in raw data table size for the database.in raw data table size for the database.
Storage Issues: Pre-joining (Calculations)Storage Issues: Pre-joining (Calculations)
Ahsan Abdullah
8
Consider the queryConsider the query “How many members“How many members
were paid claims during last year?”were paid claims during last year?”
With normalization:With normalization:
Simply count the number of records in the masterSimply count the number of records in the master
table.table.
After denormalization:After denormalization:
The member_ID would be repeated, hence need aThe member_ID would be repeated, hence need a
count distinct. This will cause sorting on a largercount distinct. This will cause sorting on a larger
table and degraded performance.table and degraded performance.
Performance Issues: Pre-joiningPerformance Issues: Pre-joining
Ahsan Abdullah
9
Depending on the query, the performanceDepending on the query, the performance
actually deteriorates with denormalization!actually deteriorates with denormalization!
This is due to the following three reasons:This is due to the following three reasons:
 Forcing a sort due to count distinct.Forcing a sort due to count distinct.
 Using a table with 1.5 times header size.Using a table with 1.5 times header size.
 Using a table which is 2 times larger.Using a table which is 2 times larger.
 Resulting in 3 times degradation inResulting in 3 times degradation in
performance.performance.
Bottom Line: Other than 0.2 GB additionalBottom Line: Other than 0.2 GB additional
space, also keep the 0.4 GB master table.space, also keep the 0.4 GB master table.
Why Performance Issues: Pre-joiningWhy Performance Issues: Pre-joining
Ahsan Abdullah
10
Continuing with the previous Health-CareContinuing with the previous Health-Care
example, assuming a 60 byte detail table andexample, assuming a 60 byte detail table and
10 byte Sale_Person.10 byte Sale_Person.
 Copying the Sale_Person to the detail tableCopying the Sale_Person to the detail table
results in all scans taking 16% longer thanresults in all scans taking 16% longer than
previously.previously.
 Justifiable only if significant portion of queries getJustifiable only if significant portion of queries get
benefit by accessing the denormalized detailbenefit by accessing the denormalized detail
table.table.
 Need to look at the cost-benefit trade-off for eachNeed to look at the cost-benefit trade-off for each
denormalization decision.denormalization decision.
Performance Issues: Adding redundant columnsPerformance Issues: Adding redundant columns
Ahsan Abdullah
11
Other issues include, increase in table size,
maintenance and loss of information:
 The size of the (largest table i.e.) transaction table
increases by the size of the Sale_Person key.
 For the example being considered, the detail table size
increases from 1.2 GB to 1.32 GB.
 If the Sale_Person key changes (e.g. new 12 digit
NID), then updates to be reflected all the way to
transaction table.
 In the absence of 1:M relationship, column movement
will actually result in loss of data.
Other Issues: Adding redundant columnsOther Issues: Adding redundant columns
Ahsan Abdullah
12
Horizontal splitting is a Divide&Conquer techniqueHorizontal splitting is a Divide&Conquer technique
that exploits parallelism. The conquer part of thethat exploits parallelism. The conquer part of the
technique is about combining the results.technique is about combining the results.
Lets see how it works forLets see how it works for hash basedhash based
splitting/partitioning.splitting/partitioning.
 Assuming uniform hashing, hash splitting supportsAssuming uniform hashing, hash splitting supports
even data distribution across all partitions in a pre-even data distribution across all partitions in a pre-
defined manner.defined manner.
 However, hash based splitting is not easilyHowever, hash based splitting is not easily
reversible to eliminate the split.reversible to eliminate the split.
Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
Ahsan Abdullah
13
Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
?
Ahsan Abdullah
14
Round robin and random splitting:Round robin and random splitting:
Guarantee good data distribution.Guarantee good data distribution.
Almost impossible to reverse (or undo).Almost impossible to reverse (or undo).
Not pre-defined.Not pre-defined.
Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
Ahsan Abdullah
15
Range and expression splitting:Range and expression splitting:
Can facilitate partition elimination with aCan facilitate partition elimination with a
smart optimizer.smart optimizer.
Generally lead to "hot spots” (unevenGenerally lead to "hot spots” (uneven
distribution of data).distribution of data).
Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
Ahsan Abdullah
16
Performance Issues: Horizontal SplittingPerformance Issues: Horizontal Splitting
P1 P2 P3 P4
1998 1999 2000 2001
Dramatic cancellation
of airline reservations
after 9/11, resulting in
“hot spot”
Splitting based on year
Processors
Ahsan Abdullah
17
Performance issues: Vertical Splitting FactsPerformance issues: Vertical Splitting Facts
Example:Example: Consider a 100 byte header for theConsider a 100 byte header for the
member table such that 20 bytes providemember table such that 20 bytes provide
complete coverage for 90% of the queries.complete coverage for 90% of the queries.
Split the member table into two parts as follows:Split the member table into two parts as follows:
1.1. Frequently accessed portion of table (20 bytes),Frequently accessed portion of table (20 bytes),
andand
2. Infrequently accessed portion of table (80+2. Infrequently accessed portion of table (80+
bytes).bytes). Why 80+?Why 80+?
Note that primary key (member_id) must beNote that primary key (member_id) must be
present in both tables for eliminating the split.present in both tables for eliminating the split.
Ahsan Abdullah
18
Scanning the claim table for mostScanning the claim table for most
frequently used queries will be 500% fasterfrequently used queries will be 500% faster
with vertical splittingwith vertical splitting
IronicallyIronically, for the “infrequently” accessed, for the “infrequently” accessed
queries the performance will be inferior asqueries the performance will be inferior as
compared to the un-split table because ofcompared to the un-split table because of
the join overhead.the join overhead.
Performance issues: Vertical Splitting Good vs. BadPerformance issues: Vertical Splitting Good vs. Bad

More Related Content

What's hot

Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...BI Brainz
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptSubrata Kumer Paul
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreukErwin de Kreuk
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with DenodoEnabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with DenodoDenodo
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
how to move data from on premise to ssis in google cloud platform ,azure, sno...
how to move data from on premise to ssis in google cloud platform ,azure, sno...how to move data from on premise to ssis in google cloud platform ,azure, sno...
how to move data from on premise to ssis in google cloud platform ,azure, sno...KeerthuBabu
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 

What's hot (20)

Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.ppt
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreuk
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with DenodoEnabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
how to move data from on premise to ssis in google cloud platform ,azure, sno...
how to move data from on premise to ssis in google cloud platform ,azure, sno...how to move data from on premise to ssis in google cloud platform ,azure, sno...
how to move data from on premise to ssis in google cloud platform ,azure, sno...
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 

Viewers also liked

Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design phanleson
 
Entity relationship diagram - Concept on normalization
Entity relationship diagram - Concept on normalizationEntity relationship diagram - Concept on normalization
Entity relationship diagram - Concept on normalizationSatya Pal
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization emailharmeet
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 

Viewers also liked (6)

Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Entity relationship diagram - Concept on normalization
Entity relationship diagram - Concept on normalizationEntity relationship diagram - Concept on normalization
Entity relationship diagram - Concept on normalization
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 

Similar to DW Lecture on Issues of Denormalization

Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Sqrrl
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Shani729
 
Black and White Energy Savings Tuning Truths
Black and White Energy Savings Tuning TruthsBlack and White Energy Savings Tuning Truths
Black and White Energy Savings Tuning TruthsScott Hayes
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld
 
Performance tuning intro
Performance tuning introPerformance tuning intro
Performance tuning introaioughydchapter
 
Xiotech Redefining Storage Value
Xiotech   Redefining Storage ValueXiotech   Redefining Storage Value
Xiotech Redefining Storage Valuehypknight
 
Making MySQL Great For Business Intelligence
Making MySQL Great For Business IntelligenceMaking MySQL Great For Business Intelligence
Making MySQL Great For Business IntelligenceCalpont
 
Lecture 10
Lecture 10Lecture 10
Lecture 10Shani729
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareDatabricks
 
Webinar: How To Make Hybrid Perform Like All Flash
Webinar: How To Make Hybrid Perform Like All FlashWebinar: How To Make Hybrid Perform Like All Flash
Webinar: How To Make Hybrid Perform Like All FlashStorage Switzerland
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory HBaseCon
 
Webinar: The All-Flash Data Center, Myth or Reality?
Webinar: The All-Flash Data Center, Myth or Reality?Webinar: The All-Flash Data Center, Myth or Reality?
Webinar: The All-Flash Data Center, Myth or Reality?Storage Switzerland
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetupt3rmin4t0r
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksUS-Analytics
 
BW on HANA optimisation answers
BW on HANA optimisation answersBW on HANA optimisation answers
BW on HANA optimisation answersAjay Kumar Uppal
 
Db2 performance tuning for dummies
Db2 performance tuning for dummiesDb2 performance tuning for dummies
Db2 performance tuning for dummiesAngel Dueñas Neyra
 
What's New in MySQL 5.7
What's New in MySQL 5.7What's New in MySQL 5.7
What's New in MySQL 5.7Olivier DASINI
 
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.George Joseph
 

Similar to DW Lecture on Issues of Denormalization (20)

Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
 
Black and White Energy Savings Tuning Truths
Black and White Energy Savings Tuning TruthsBlack and White Energy Savings Tuning Truths
Black and White Energy Savings Tuning Truths
 
Performance Tuning intro
Performance Tuning introPerformance Tuning intro
Performance Tuning intro
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing Databases
 
Performance tuning intro
Performance tuning introPerformance tuning intro
Performance tuning intro
 
Xiotech Redefining Storage Value
Xiotech   Redefining Storage ValueXiotech   Redefining Storage Value
Xiotech Redefining Storage Value
 
Making MySQL Great For Business Intelligence
Making MySQL Great For Business IntelligenceMaking MySQL Great For Business Intelligence
Making MySQL Great For Business Intelligence
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Webinar: How To Make Hybrid Perform Like All Flash
Webinar: How To Make Hybrid Perform Like All FlashWebinar: How To Make Hybrid Perform Like All Flash
Webinar: How To Make Hybrid Perform Like All Flash
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
Webinar: The All-Flash Data Center, Myth or Reality?
Webinar: The All-Flash Data Center, Myth or Reality?Webinar: The All-Flash Data Center, Myth or Reality?
Webinar: The All-Flash Data Center, Myth or Reality?
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
 
IT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance SucksIT Made Me Virtualize Essbase and Performance Sucks
IT Made Me Virtualize Essbase and Performance Sucks
 
BW on HANA optimisation answers
BW on HANA optimisation answersBW on HANA optimisation answers
BW on HANA optimisation answers
 
Db2 performance tuning for dummies
Db2 performance tuning for dummiesDb2 performance tuning for dummies
Db2 performance tuning for dummies
 
What's New in MySQL 5.7
What's New in MySQL 5.7What's New in MySQL 5.7
What's New in MySQL 5.7
 
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
 

More from Shani729

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012Shani729
 
Python tutorial
Python tutorialPython tutorial
Python tutorialShani729
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionShani729
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)Shani729
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10Shani729
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Shani729
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Shani729
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2Shani729
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1Shani729
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13Shani729
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Shani729
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Lecture 40
Lecture 40Lecture 40
Lecture 40Shani729
 
Lecture 39
Lecture 39Lecture 39
Lecture 39Shani729
 
Lecture 38
Lecture 38Lecture 38
Lecture 38Shani729
 
Lecture 37
Lecture 37Lecture 37
Lecture 37Shani729
 
Lecture 35
Lecture 35Lecture 35
Lecture 35Shani729
 

More from Shani729 (20)

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 
Lecture 35
Lecture 35Lecture 35
Lecture 35
 

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 

DW Lecture on Issues of Denormalization

  • 1. Ahsan AbdullahAhsan Abdullah 11 Data WarehousingData Warehousing Lecture-9Lecture-9 Issues of De-normalizationIssues of De-normalization Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing.com
  • 2. Ahsan Abdullah 2 Issues of De-normalizationIssues of De-normalization
  • 4. Ahsan Abdullah 4 Issues of DenormalizationIssues of Denormalization  StorageStorage  PerformancePerformance  Ease-of-useEase-of-use  MaintenanceMaintenance
  • 5. Ahsan Abdullah 5 Industry CharacteristicsIndustry Characteristics Master:Detail RatiosMaster:Detail Ratios  Health care 1:2 ratioHealth care 1:2 ratio  Video Rental 1:3 ratioVideo Rental 1:3 ratio  Retail 1:30 ratioRetail 1:30 ratio
  • 6. Ahsan Abdullah 6 Storage Issues: Pre-joining FactsStorage Issues: Pre-joining Facts  Assume 1:2 record count ratio between claimAssume 1:2 record count ratio between claim master and detail for health-care application.master and detail for health-care application.  Assume 10 million members (20 millionAssume 10 million members (20 million records in claim detail).records in claim detail).  Assume 10 byte member_ID.Assume 10 byte member_ID.  Assume 40 byte header for master and 60Assume 40 byte header for master and 60 byte header for detail tables.byte header for detail tables.
  • 7. Ahsan Abdullah 7 With normalization:With normalization: Total space used = 10 x 40 + 20 x 60 = 1.6 GBTotal space used = 10 x 40 + 20 x 60 = 1.6 GB After denormalization:After denormalization: Total space used = (60 + 40 – 10) x 20 = 1.8 GBTotal space used = (60 + 40 – 10) x 20 = 1.8 GB Net result is 12.5% additional space requiredNet result is 12.5% additional space required in raw data table size for the database.in raw data table size for the database. Storage Issues: Pre-joining (Calculations)Storage Issues: Pre-joining (Calculations)
  • 8. Ahsan Abdullah 8 Consider the queryConsider the query “How many members“How many members were paid claims during last year?”were paid claims during last year?” With normalization:With normalization: Simply count the number of records in the masterSimply count the number of records in the master table.table. After denormalization:After denormalization: The member_ID would be repeated, hence need aThe member_ID would be repeated, hence need a count distinct. This will cause sorting on a largercount distinct. This will cause sorting on a larger table and degraded performance.table and degraded performance. Performance Issues: Pre-joiningPerformance Issues: Pre-joining
  • 9. Ahsan Abdullah 9 Depending on the query, the performanceDepending on the query, the performance actually deteriorates with denormalization!actually deteriorates with denormalization! This is due to the following three reasons:This is due to the following three reasons:  Forcing a sort due to count distinct.Forcing a sort due to count distinct.  Using a table with 1.5 times header size.Using a table with 1.5 times header size.  Using a table which is 2 times larger.Using a table which is 2 times larger.  Resulting in 3 times degradation inResulting in 3 times degradation in performance.performance. Bottom Line: Other than 0.2 GB additionalBottom Line: Other than 0.2 GB additional space, also keep the 0.4 GB master table.space, also keep the 0.4 GB master table. Why Performance Issues: Pre-joiningWhy Performance Issues: Pre-joining
  • 10. Ahsan Abdullah 10 Continuing with the previous Health-CareContinuing with the previous Health-Care example, assuming a 60 byte detail table andexample, assuming a 60 byte detail table and 10 byte Sale_Person.10 byte Sale_Person.  Copying the Sale_Person to the detail tableCopying the Sale_Person to the detail table results in all scans taking 16% longer thanresults in all scans taking 16% longer than previously.previously.  Justifiable only if significant portion of queries getJustifiable only if significant portion of queries get benefit by accessing the denormalized detailbenefit by accessing the denormalized detail table.table.  Need to look at the cost-benefit trade-off for eachNeed to look at the cost-benefit trade-off for each denormalization decision.denormalization decision. Performance Issues: Adding redundant columnsPerformance Issues: Adding redundant columns
  • 11. Ahsan Abdullah 11 Other issues include, increase in table size, maintenance and loss of information:  The size of the (largest table i.e.) transaction table increases by the size of the Sale_Person key.  For the example being considered, the detail table size increases from 1.2 GB to 1.32 GB.  If the Sale_Person key changes (e.g. new 12 digit NID), then updates to be reflected all the way to transaction table.  In the absence of 1:M relationship, column movement will actually result in loss of data. Other Issues: Adding redundant columnsOther Issues: Adding redundant columns
  • 12. Ahsan Abdullah 12 Horizontal splitting is a Divide&Conquer techniqueHorizontal splitting is a Divide&Conquer technique that exploits parallelism. The conquer part of thethat exploits parallelism. The conquer part of the technique is about combining the results.technique is about combining the results. Lets see how it works forLets see how it works for hash basedhash based splitting/partitioning.splitting/partitioning.  Assuming uniform hashing, hash splitting supportsAssuming uniform hashing, hash splitting supports even data distribution across all partitions in a pre-even data distribution across all partitions in a pre- defined manner.defined manner.  However, hash based splitting is not easilyHowever, hash based splitting is not easily reversible to eliminate the split.reversible to eliminate the split. Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
  • 13. Ahsan Abdullah 13 Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting ?
  • 14. Ahsan Abdullah 14 Round robin and random splitting:Round robin and random splitting: Guarantee good data distribution.Guarantee good data distribution. Almost impossible to reverse (or undo).Almost impossible to reverse (or undo). Not pre-defined.Not pre-defined. Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
  • 15. Ahsan Abdullah 15 Range and expression splitting:Range and expression splitting: Can facilitate partition elimination with aCan facilitate partition elimination with a smart optimizer.smart optimizer. Generally lead to "hot spots” (unevenGenerally lead to "hot spots” (uneven distribution of data).distribution of data). Ease of use Issues: Horizontal SplittingEase of use Issues: Horizontal Splitting
  • 16. Ahsan Abdullah 16 Performance Issues: Horizontal SplittingPerformance Issues: Horizontal Splitting P1 P2 P3 P4 1998 1999 2000 2001 Dramatic cancellation of airline reservations after 9/11, resulting in “hot spot” Splitting based on year Processors
  • 17. Ahsan Abdullah 17 Performance issues: Vertical Splitting FactsPerformance issues: Vertical Splitting Facts Example:Example: Consider a 100 byte header for theConsider a 100 byte header for the member table such that 20 bytes providemember table such that 20 bytes provide complete coverage for 90% of the queries.complete coverage for 90% of the queries. Split the member table into two parts as follows:Split the member table into two parts as follows: 1.1. Frequently accessed portion of table (20 bytes),Frequently accessed portion of table (20 bytes), andand 2. Infrequently accessed portion of table (80+2. Infrequently accessed portion of table (80+ bytes).bytes). Why 80+?Why 80+? Note that primary key (member_id) must beNote that primary key (member_id) must be present in both tables for eliminating the split.present in both tables for eliminating the split.
  • 18. Ahsan Abdullah 18 Scanning the claim table for mostScanning the claim table for most frequently used queries will be 500% fasterfrequently used queries will be 500% faster with vertical splittingwith vertical splitting IronicallyIronically, for the “infrequently” accessed, for the “infrequently” accessed queries the performance will be inferior asqueries the performance will be inferior as compared to the un-split table because ofcompared to the un-split table because of the join overhead.the join overhead. Performance issues: Vertical Splitting Good vs. BadPerformance issues: Vertical Splitting Good vs. Bad

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>