Rushdi Shams, Dept of CSE, KUET
Database SystemsDatabase Systems
Data WarehousingData Warehousing
Version 1.0Version 1.0
1
Rushdi Shams, Dept of CSE, KUET
The Advent of Data WarehousingThe Advent of Data Warehousing
 The existing database model...
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use
 Requires precise, accurate, andRequires precise, accurate...
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use
 Customer-company direct interactionCustomer-company direc...
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
 Operational use magnifies the scope-Operational...
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
 …… & the benefits are-& the benefits are-
 In ...
Rushdi Shams, Dept of CSE, KUET
And The War Begins…And The War Begins…
 So, the conflict between lightspeedSo, the confli...
Rushdi Shams, Dept of CSE, KUET
Relational DatabasesRelational Databases
 Too granular, too many little piecesToo granula...
Rushdi Shams, Dept of CSE, KUET
Data warehousingData warehousing
 Processes large amount of informationProcesses large am...
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
 Data Warehousing is simplest form ofD...
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
 The one-many / many-many / many-oneTh...
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
 So, if data warehouse needs a diffe...
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
 Contains-Contains-
1.1. FactsFacts
...
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
Static Data
Dynamic Data
14
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
 The most effective approach to model dataThe most effecti...
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
16
Rushdi Shams, Dept of CSE, KUET
The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram
17
Rushdi Shams, Dept of CSE, KUET
The Star Schema: PropertiesThe Star Schema: Properties
 So, a star schema contains a fact...
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
 Normalized star schemaNormalized star schema
 ...
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
Fact Table
Normalized
Dimension
Dimension
20
Rushdi Shams, Dept of CSE, KUET
The Snowflake Schema: EquivalentThe Snowflake Schema: Equivalent
ViewView
21
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem
 Not too many tables but too many layersNot too many tables but to...
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem
 If the SALE fact table has 1 million records, and all
dimensions ...
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
 Convert the snowflake schema into starConvert the snowflake sch...
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
25
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
 a join occurs between one fact table and six
dimensional tables...
Rushdi Shams, Dept of CSE, KUET
The DifferenceThe Difference
 The difference between 1012
and 1015
is three
decimals.
 T...
Rushdi Shams, Dept of CSE, KUET
Types of Dimension TablesTypes of Dimension Tables
 Dimension tables showed so far areDim...
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
29
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
30
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations
 Locations, stat...
Rushdi Shams, Dept of CSE, KUET
Let’s Create a DataLet’s Create a Data
Warehouse ModelWarehouse Model
32
Rushdi Shams, Dept of CSE, KUET
The Relational ModelThe Relational Model
33
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
 Identify the Fact tableIdentify the Fact table
 The Fact table contains (m...
Rushdi Shams, Dept of CSE, KUET
Step 1: Finding the Fact TableStep 1: Finding the Fact Table
35
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
 So, our fact table would be (in this case)So, our fact table would be (in t...
Rushdi Shams, Dept of CSE, KUET
Step 2: Find Dimension TablesStep 2: Find Dimension Tables
 Find the tables that are stat...
Rushdi Shams, Dept of CSE, KUET
Step 3Step 3
 Develop a snowflake schema with the fact andDevelop a snowflake schema with...
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
39
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
40
Rushdi Shams, Dept of CSE, KUET
Step 4Step 4
 Develop a star schema by denormalizing theDevelop a star schema by denormal...
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
42
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
43
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data
WarehouseWarehous...
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data
WarehouseWarehous...
Rushdi Shams, Dept of CSE, KUET
Understanding the Fact TableUnderstanding the Fact Table
 Facts are numeric valuesFacts a...
Rushdi Shams, Dept of CSE, KUET
ReferenceReference
 Beginning Database Design by GavinBeginning Database Design by Gavin
...
Upcoming SlideShare
Loading in …5
×

L16 l17 datawarehouse

546 views
477 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
546
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

L16 l17 datawarehouse

  1. 1. Rushdi Shams, Dept of CSE, KUET Database SystemsDatabase Systems Data WarehousingData Warehousing Version 1.0Version 1.0 1
  2. 2. Rushdi Shams, Dept of CSE, KUET The Advent of Data WarehousingThe Advent of Data Warehousing  The existing database models were notThe existing database models were not suitable to meet the requirements.suitable to meet the requirements.  The requirements can be categorized intoThe requirements can be categorized into two-two- 1.1. Operational UseOperational Use 2.2. Decision Support UseDecision Support Use 2
  3. 3. Rushdi Shams, Dept of CSE, KUET Operational UseOperational Use  Requires precise, accurate, andRequires precise, accurate, and instantinstant picture ofpicture of databasedatabase  Day to day basis business-Day to day basis business- 1.1. Customer comesCustomer comes 2.2. Orders partsOrders parts 1.1. Search the partsSearch the parts 2.2. Book/purchase the partsBook/purchase the parts 3.3. Add datesAdd dates 1.1. Bank transactions on the purchase/ bookingBank transactions on the purchase/ booking 2.2. InvoiceInvoice 3
  4. 4. Rushdi Shams, Dept of CSE, KUET Operational UseOperational Use  Customer-company direct interactionCustomer-company direct interaction  All the information are processedAll the information are processed instantaneously (or almost instantaneously)instantaneously (or almost instantaneously) 4
  5. 5. Rushdi Shams, Dept of CSE, KUET Decision Support UseDecision Support Use  Operational use magnifies the scope-Operational use magnifies the scope- Which customer, where he lives, what is hisWhich customer, where he lives, what is his phone number, which part he bought, howphone number, which part he bought, how much he paid, what was the date, bla bla bla…much he paid, what was the date, bla bla bla…  Decision support use narrows the scope-Decision support use narrows the scope- I need only the business related issues- whichI need only the business related issues- which customer, which part he bought, how muchcustomer, which part he bought, how much he paid and what was the datehe paid and what was the date 5
  6. 6. Rushdi Shams, Dept of CSE, KUET Decision Support UseDecision Support Use  …… & the benefits are-& the benefits are-  In december, the company may need to stockIn december, the company may need to stock DDR RAM more than HDDDDR RAM more than HDD  SATA HDDs are more sold than PATA HDDsSATA HDDs are more sold than PATA HDDs  Mr. X is our honourable customer who boughtMr. X is our honourable customer who bought most of the RAMs and Mr. Y is our honourablemost of the RAMs and Mr. Y is our honourable customer who bought most of the SATA HDDscustomer who bought most of the SATA HDDs 6
  7. 7. Rushdi Shams, Dept of CSE, KUET And The War Begins…And The War Begins…  So, the conflict between lightspeedSo, the conflict between lightspeed applications (OLTP) and slog futureapplications (OLTP) and slog future predictions led an advent of datapredictions led an advent of data warehousing.warehousing. 7
  8. 8. Rushdi Shams, Dept of CSE, KUET Relational DatabasesRelational Databases  Too granular, too many little piecesToo granular, too many little pieces  Processing takes longer time for largerProcessing takes longer time for larger transactions by joining those little piecestransactions by joining those little pieces  Very effective for Front End applications thatVery effective for Front End applications that are accessed by too many people tooare accessed by too many people too frequentlyfrequently  Requires less hardware specificationRequires less hardware specification 8
  9. 9. Rushdi Shams, Dept of CSE, KUET Data warehousingData warehousing  Processes large amount of informationProcesses large amount of information  Too less users (basically the owners)Too less users (basically the owners)  Mainly for reporting and analysisMainly for reporting and analysis  Hardware requirements are hugeHardware requirements are huge 9
  10. 10. Rushdi Shams, Dept of CSE, KUET The relation between themThe relation between them  Data Warehousing is simplest form ofData Warehousing is simplest form of relational databaserelational database  Try to only add data and remove data…Try to only add data and remove data… because most often changing requires hugebecause most often changing requires huge data processingdata processing  And you often do mistake in Keys for just twoAnd you often do mistake in Keys for just two records, in this case you are dealing withrecords, in this case you are dealing with millions of records- so, think about datamillions of records- so, think about data modificationsmodifications 10
  11. 11. Rushdi Shams, Dept of CSE, KUET The relation between themThe relation between them  The one-many / many-many / many-oneThe one-many / many-many / many-one relations and key constraints of relationalrelations and key constraints of relational model is still present in data warehousingmodel is still present in data warehousing 11
  12. 12. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model  So, if data warehouse needs a different dataSo, if data warehouse needs a different data model rather than relational model, what thatmodel rather than relational model, what that would be?would be?  The answer is dimensional data modelThe answer is dimensional data model 12
  13. 13. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model  Contains-Contains- 1.1. FactsFacts 2.2. DimensionsDimensions  Fact table contains transactions. ForFact table contains transactions. For example, invoices of all the customers forexample, invoices of all the customers for the last 5 years.the last 5 years.  The dimension tables describe the fact table.The dimension tables describe the fact table. 13
  14. 14. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model Static Data Dynamic Data 14
  15. 15. Rushdi Shams, Dept of CSE, KUET The Star SchemaThe Star Schema  The most effective approach to model dataThe most effective approach to model data using dimensional data model is theusing dimensional data model is the StarStar SchemaSchema 15
  16. 16. Rushdi Shams, Dept of CSE, KUET The Star SchemaThe Star Schema 16
  17. 17. Rushdi Shams, Dept of CSE, KUET The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram 17
  18. 18. Rushdi Shams, Dept of CSE, KUET The Star Schema: PropertiesThe Star Schema: Properties  So, a star schema contains a fact table- whichSo, a star schema contains a fact table- which is robust as the time goes by, very dynamic,is robust as the time goes by, very dynamic, changes all the timechanges all the time  A star schema contains dimension tables-A star schema contains dimension tables- which are static, changes very little as thewhich are static, changes very little as the time goes bytime goes by  Star schema aids queries to join a bulky factStar schema aids queries to join a bulky fact table with dimension tables to be simple andtable with dimension tables to be simple and not time complexnot time complex 18
  19. 19. Rushdi Shams, Dept of CSE, KUET The Snowflake SchemaThe Snowflake Schema  Normalized star schemaNormalized star schema  Only the dimensions are normalizedOnly the dimensions are normalized  The result is a fact table connected directlyThe result is a fact table connected directly with some dimension tables and somewith some dimension tables and some dimension tables connected to otherdimension tables connected to other dimension tablesdimension tables 19
  20. 20. Rushdi Shams, Dept of CSE, KUET The Snowflake SchemaThe Snowflake Schema Fact Table Normalized Dimension Dimension 20
  21. 21. Rushdi Shams, Dept of CSE, KUET The Snowflake Schema: EquivalentThe Snowflake Schema: Equivalent ViewView 21
  22. 22. Rushdi Shams, Dept of CSE, KUET The ProblemThe Problem  Not too many tables but too many layersNot too many tables but too many layers  The most used Relational algebra inThe most used Relational algebra in dimensional database isdimensional database is JoinJoin  Too many tables in joins, too many overheads.Too many tables in joins, too many overheads.  There are not many tables hereThere are not many tables here   But too many layers, joining one tableBut too many layers, joining one table requires joining other related tablesrequires joining other related tables   And if one of those tables (Fact) have trillionsAnd if one of those tables (Fact) have trillions of data, you are dead!of data, you are dead! 22
  23. 23. Rushdi Shams, Dept of CSE, KUET The ProblemThe Problem  If the SALE fact table has 1 million records, and all dimensions contain 10 records each, a Cartesian product would return 106 multiplied by 109 records. That makes for 1015 records 23
  24. 24. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution  Convert the snowflake schema into starConvert the snowflake schema into star schema.schema. 24
  25. 25. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution 25
  26. 26. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution  a join occurs between one fact table and six dimensional tables. That is a Cartesian product of 106 multiple by 106 , resulting in 1012 records returned. 26
  27. 27. Rushdi Shams, Dept of CSE, KUET The DifferenceThe Difference  The difference between 1012 and 1015 is three decimals.  Three decimals is not just three zeroes and thus 1,000 records. The difference is actually 1,000,000,000,000,000 – 1,000,000,000,000 = 999,000,000,000,000. 27
  28. 28. Rushdi Shams, Dept of CSE, KUET Types of Dimension TablesTypes of Dimension Tables  Dimension tables showed so far areDimension tables showed so far are inadequateinadequate  Typically, there are some conventions forTypically, there are some conventions for dimension tables.dimension tables.  Such as dates and locations are two commonSuch as dates and locations are two common dimension tables in data warehouses.dimension tables in data warehouses.  Why?? Most businesses have two commonWhy?? Most businesses have two common issues- date of a transaction, place ofissues- date of a transaction, place of shipment/ deliveryshipment/ delivery 28
  29. 29. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: DatesTypes of Dimension Tables: Dates 29
  30. 30. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: DatesTypes of Dimension Tables: Dates 30
  31. 31. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations  Locations, states, country, continent, etcLocations, states, country, continent, etc 31
  32. 32. Rushdi Shams, Dept of CSE, KUET Let’s Create a DataLet’s Create a Data Warehouse ModelWarehouse Model 32
  33. 33. Rushdi Shams, Dept of CSE, KUET The Relational ModelThe Relational Model 33
  34. 34. Rushdi Shams, Dept of CSE, KUET Step 1Step 1  Identify the Fact tableIdentify the Fact table  The Fact table contains (mostly) transactionsThe Fact table contains (mostly) transactions that occur day-to-day basis/ that are relatedthat occur day-to-day basis/ that are related with money/ anything that is the mainwith money/ anything that is the main purpose of a businesspurpose of a business 34
  35. 35. Rushdi Shams, Dept of CSE, KUET Step 1: Finding the Fact TableStep 1: Finding the Fact Table 35
  36. 36. Rushdi Shams, Dept of CSE, KUET Step 1Step 1  So, our fact table would be (in this case)So, our fact table would be (in this case) RoyaltyRoyalty 36
  37. 37. Rushdi Shams, Dept of CSE, KUET Step 2: Find Dimension TablesStep 2: Find Dimension Tables  Find the tables that are static, not dynamic…Find the tables that are static, not dynamic… dynamic one is the Fact table.dynamic one is the Fact table.  We will take a look at both the staticWe will take a look at both the static (dimension) tables and dynamic (fact) tables(dimension) tables and dynamic (fact) tables when we will finish step 3when we will finish step 3 37
  38. 38. Rushdi Shams, Dept of CSE, KUET Step 3Step 3  Develop a snowflake schema with the fact andDevelop a snowflake schema with the fact and dimension tablesdimension tables 38
  39. 39. Rushdi Shams, Dept of CSE, KUET Step 3: Snowflake SchemaStep 3: Snowflake Schema 39
  40. 40. Rushdi Shams, Dept of CSE, KUET Step 3: Snowflake SchemaStep 3: Snowflake Schema 40
  41. 41. Rushdi Shams, Dept of CSE, KUET Step 4Step 4  Develop a star schema by denormalizing theDevelop a star schema by denormalizing the snowflake schemasnowflake schema 41
  42. 42. Rushdi Shams, Dept of CSE, KUET Step 4: Star SchemaStep 4: Star Schema 42
  43. 43. Rushdi Shams, Dept of CSE, KUET Step 4: Star SchemaStep 4: Star Schema 43
  44. 44. Rushdi Shams, Dept of CSE, KUET Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data WarehouseWarehouse  A customer is recognized in table 1 by customerA customer is recognized in table 1 by customer namename  The same person in table 2 is recognized byThe same person in table 2 is recognized by telephone numbertelephone number  The same person in table 3 is recognized by SSNThe same person in table 3 is recognized by SSN numbernumber  If you have to make table 1, 2, 3 as dimension tables,If you have to make table 1, 2, 3 as dimension tables, then the fact table will not be able to recognize thethen the fact table will not be able to recognize the same person having 3 foreign keys from those tablessame person having 3 foreign keys from those tables 44
  45. 45. Rushdi Shams, Dept of CSE, KUET Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data WarehouseWarehouse 45
  46. 46. Rushdi Shams, Dept of CSE, KUET Understanding the Fact TableUnderstanding the Fact Table  Facts are numeric valuesFacts are numeric values  Facts are not the foreign key fieldsFacts are not the foreign key fields  The foreign keys are used to provide moreThe foreign keys are used to provide more detail with a fact- which are the main focus ofdetail with a fact- which are the main focus of the businessthe business 46
  47. 47. Rushdi Shams, Dept of CSE, KUET ReferenceReference  Beginning Database Design by GavinBeginning Database Design by Gavin Powell, Wrox Publications, 2005Powell, Wrox Publications, 2005 47

×