Datawarehousing with MySQL


Published on

Dimensional Data Warehousing with MySQL

1 Comment
1 Like
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Datawarehousing with MySQL

  3. 3. CHAPTER 1-BASIC CONCEPTS In chapter 1 of the book dimensional datawarehousing with MySQL, four tasks are primarily covered which include: • Creating a database user • Creating the data warehouse database and the source database • Creating data warehouse tables • Generating surrogate keys For the purpose of practice, the first step of creating a user-id has been avoided and the root id is used. All other tasks have been covered with a screenshot of the outcomes
  4. 4. CREATING A DATABASE In this task two databases dw and source have been created using the following commands: Create database dw; Create database source;
  5. 5. CREATING DATAWAREHOUSING TABLES Creating customer_dim table:
  6. 6. CREATING DATAWAREHOUSING TABLES Creating product_dim table:
  7. 7. CREATING DATAWAREHOUSING TABLES Creating order_dim table:
  8. 8. CREATING DATAWAREHOUSING TABLES Creating date_dim table:
  9. 9. CREATING DATAWAREHOUSING TABLES Creating sales_order_fact table:
  10. 10. GENERATING SURROGATE KEYS In the customer_dim table, 3 entries are inserted with a null value for customer_sk field. The surrogate keys are automatically created
  12. 12. • Slowly Changing Dimension (SCD) is the technique for implementing dimension history in a dimensional data warehouse • For practicing we would use SCD1 and SCD2 techniques • Slowly Changing Dimension Type1 involves updating and inserting into the customer_dim table • Before updating and inserting data a customer_stg table has to be created
  13. 13. • The Update statement copies the value of the customer_name column in the staging table to the customer_name column in the customer_dim table • The Insert statement inserts the record in the staging table that is not yet present in the customer_dim table • Running the script updates the name of the first customer and inserts the seventh customer in the staging table to the customer_dim table. •
  14. 14. Creating and Loading Customer Staging Table
  15. 15. Applying SCD1: Updating Existing Customers and inserting into customer_dim
  16. 16. Slowly Changing SCD Type 2 • SCD2 has been applied to the product_dim table • Whenever there is a change in the product_name and product_category columns, SCD2 would remove the existing row and add a new row that would describe the same product.
  17. 17. Creating a product_stg file
  18. 18. Applying SCD2 to the product_name and product_category in the product_dim table
  19. 19. • The next output would show that SCD2 has been applied successfully • Product 1 has two rows • One of the rows, with product_sk 1, has expired with expiry date 4th Febuary, 2007 • This is one day earlier to the expiry date before applying SCD2 • Also another row is created with product_sk as 3 and it has a new name • Its effective date is 5th February 2007 and expiry date 9999-12-31 • This means that it has not yet expired
  20. 20. Chapter 3: Measure Additivity
  21. 21. Testing Full Additivity Inserting data into order_dim table
  22. 22. Testing Full Additivity: Inserting data into table date_dim
  23. 23. Testing Full Additivity: Inserting data into sales_order_fact:
  24. 24. Testing Full Additivity: Generating the sum of the total order amounts by querying across all dimensions:
  25. 25. Testing Full Additivity: Generating the sum by querying across date, customer and order
  26. 26. Testing Full Additivity: Generating the sum of total orders by querying across date and order:
  27. 27. Chapter 4: Dimensional Queries
  28. 28. Aggregate Queries • Aggregate queries aggregates individual facts • The values are either summed or counted • Under aggregate queries we would run two examples: aggregation of daily sales and annual sales • In all the cases, joins between tables is done using surrogate keys.
  29. 29. Daily Sales Aggregation: The aggregation of the order amounts and number of orders is done by date
  30. 30. Annual Sales Aggregation: The order amounts and the number of orders are not only aggregated by date, but also by product and customer city
  31. 31. Specific Queries: Monthly Storage Product Sales: The following query aggregates sales amount and the number of orders per month.
  32. 32. Specific Queries: Quarterly Sales in Mechanicsburg: The following query produces the quarterly aggregation of the order amounts in Mechanicsburg
  33. 33. Inside-Out Queries: Product Performer: The following query gives you the sales orders of products that have a monthly sales amount of 7,500 or more.
  34. 34. Inside –Out Queries: Loyal Customer The following query shows customers who have placed more than five orders annually in the past 18 months
  35. 35. Chapter 5: Source Extraction
  36. 36. • Push-by-source CDC- It means that the source system extracts only the changes since the last extraction • Push-by-source CDC has been demonstrated on the sales order source data • It has been done using a stored procedure that extracts sales order data from the sales_order in the source database
  37. 37. Creating a sales_order table in another database called source and inserting values in the tables Order_dim:
  38. 38. Date_dim
  39. 39. Inserting values in the order_dim and date_dim tables in the dw database
  40. 40. Running the following stored procedure: USE source; DELIMITER // ; DROP PROCEDURE IF EXISTS push_sales_order // CREATE PROCEDURE push_sales_order() BEGIN INSERT INTO dw.sales_order_fact SELECT a.order_amount, b.order_sk, c.customer_sk, d.product_sk, e.date_sk FROM sales_order a, dw.order_dim b, dw.customer_dim c, dw.product_dim d, dw.date_dim e WHERE a.entry_date = CURRENT_DATE AND a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.product_code = d.product_code AND a.order_date >= d.effective_date AND a.order_date <= d.expiry_date AND a.order date = ; END // DELIMITER ; //
  41. 41. The above stored procedure will make changes to the sales_order_fact table in dw database.
  42. 42. Chapter 6 : Populating the Date Dimension
  43. 43. Pre-population: It is the simplest of the three techniques where the dates are inserted for a period of time For e.g. date could be inserted for 5 years between 2009 and 2014 Truncating the date_dim table
  44. 44. One Date Everyday: This technique is similar to pre-population technique, but in this technique only one date is pre-populated in a day Daily Date population:
  45. 45. Loading dates from the source: The query loads the sales order dates from the sales_order table of the source database into the date_dim table of the DW database.
  46. 46. Adding more dates from the additional sales order
  47. 47. Chapter 7: Initial Population
  48. 48. Initial Population: After identifying the source data, a script is written for initial population Order_dim table:
  49. 49. Sales_order_fact table
  50. 50. Running the Initial Population Scheme Truncating the sales_order table:
  51. 51. Running the Initial Population Scheme Preparing the sales order table
  52. 52. Query to confirm whether the sales_order are loaded correctly or not
  53. 53. Chapter 8: Regular Population
  54. 54. Regular Population Script: In this script, customer_dim and product_dim have been reloaded with data. SCD2 is applied to customer addresses, product names, and product groups. SCD1 is applied to customer names Order_dim table:
  55. 55. Product_dim:
  56. 56. Product_stg:
  57. 57. Sales_order_fact:
  58. 58. Customer_dim:
  59. 59. Testing Data: The data is tested by running the select query on the sales_order table and the sales_order_fact table
  60. 60. Chapter 10: Adding Columns
  61. 61. Adding New Columns to the customer dimension: When two new columns are added, the null values would be displayed in the respective rows: Customer_dim table:
  62. 62. Customer_stg table:
  63. 63. • Adding the order_quantity column in the sales_order •