Your SlideShare is downloading. ×
Datawarehousing with MySQL
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Datawarehousing with MySQL

263
views

Published on

Dimensional Data Warehousing with MySQL

Dimensional Data Warehousing with MySQL


1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
263
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DATAWAREHOUSING WITH MySQL
  • 2. CHAPTER 1-BASIC CONCEPTS
  • 3. CHAPTER 1-BASIC CONCEPTS In chapter 1 of the book dimensional datawarehousing with MySQL, four tasks are primarily covered which include: • Creating a database user • Creating the data warehouse database and the source database • Creating data warehouse tables • Generating surrogate keys For the purpose of practice, the first step of creating a user-id has been avoided and the root id is used. All other tasks have been covered with a screenshot of the outcomes
  • 4. CREATING A DATABASE In this task two databases dw and source have been created using the following commands: Create database dw; Create database source;
  • 5. CREATING DATAWAREHOUSING TABLES Creating customer_dim table:
  • 6. CREATING DATAWAREHOUSING TABLES Creating product_dim table:
  • 7. CREATING DATAWAREHOUSING TABLES Creating order_dim table:
  • 8. CREATING DATAWAREHOUSING TABLES Creating date_dim table:
  • 9. CREATING DATAWAREHOUSING TABLES Creating sales_order_fact table:
  • 10. GENERATING SURROGATE KEYS In the customer_dim table, 3 entries are inserted with a null value for customer_sk field. The surrogate keys are automatically created
  • 11. CHAPTER 2-DIMENSIONAL HISTORY
  • 12. • Slowly Changing Dimension (SCD) is the technique for implementing dimension history in a dimensional data warehouse • For practicing we would use SCD1 and SCD2 techniques • Slowly Changing Dimension Type1 involves updating and inserting into the customer_dim table • Before updating and inserting data a customer_stg table has to be created
  • 13. • The Update statement copies the value of the customer_name column in the staging table to the customer_name column in the customer_dim table • The Insert statement inserts the record in the staging table that is not yet present in the customer_dim table • Running the script updates the name of the first customer and inserts the seventh customer in the staging table to the customer_dim table. •
  • 14. Creating and Loading Customer Staging Table
  • 15. Applying SCD1: Updating Existing Customers and inserting into customer_dim
  • 16. Slowly Changing SCD Type 2 • SCD2 has been applied to the product_dim table • Whenever there is a change in the product_name and product_category columns, SCD2 would remove the existing row and add a new row that would describe the same product.
  • 17. Creating a product_stg file
  • 18. Applying SCD2 to the product_name and product_category in the product_dim table
  • 19. • The next output would show that SCD2 has been applied successfully • Product 1 has two rows • One of the rows, with product_sk 1, has expired with expiry date 4th Febuary, 2007 • This is one day earlier to the expiry date before applying SCD2 • Also another row is created with product_sk as 3 and it has a new name • Its effective date is 5th February 2007 and expiry date 9999-12-31 • This means that it has not yet expired
  • 20. Chapter 3: Measure Additivity
  • 21. Testing Full Additivity Inserting data into order_dim table
  • 22. Testing Full Additivity: Inserting data into table date_dim
  • 23. Testing Full Additivity: Inserting data into sales_order_fact:
  • 24. Testing Full Additivity: Generating the sum of the total order amounts by querying across all dimensions:
  • 25. Testing Full Additivity: Generating the sum by querying across date, customer and order
  • 26. Testing Full Additivity: Generating the sum of total orders by querying across date and order:
  • 27. Chapter 4: Dimensional Queries
  • 28. Aggregate Queries • Aggregate queries aggregates individual facts • The values are either summed or counted • Under aggregate queries we would run two examples: aggregation of daily sales and annual sales • In all the cases, joins between tables is done using surrogate keys.
  • 29. Daily Sales Aggregation: The aggregation of the order amounts and number of orders is done by date
  • 30. Annual Sales Aggregation: The order amounts and the number of orders are not only aggregated by date, but also by product and customer city
  • 31. Specific Queries: Monthly Storage Product Sales: The following query aggregates sales amount and the number of orders per month.
  • 32. Specific Queries: Quarterly Sales in Mechanicsburg: The following query produces the quarterly aggregation of the order amounts in Mechanicsburg
  • 33. Inside-Out Queries: Product Performer: The following query gives you the sales orders of products that have a monthly sales amount of 7,500 or more.
  • 34. Inside –Out Queries: Loyal Customer The following query shows customers who have placed more than five orders annually in the past 18 months
  • 35. Chapter 5: Source Extraction
  • 36. • Push-by-source CDC- It means that the source system extracts only the changes since the last extraction • Push-by-source CDC has been demonstrated on the sales order source data • It has been done using a stored procedure that extracts sales order data from the sales_order in the source database
  • 37. Creating a sales_order table in another database called source and inserting values in the tables Order_dim:
  • 38. Date_dim
  • 39. Inserting values in the order_dim and date_dim tables in the dw database
  • 40. Running the following stored procedure: USE source; DELIMITER // ; DROP PROCEDURE IF EXISTS push_sales_order // CREATE PROCEDURE push_sales_order() BEGIN INSERT INTO dw.sales_order_fact SELECT a.order_amount, b.order_sk, c.customer_sk, d.product_sk, e.date_sk FROM sales_order a, dw.order_dim b, dw.customer_dim c, dw.product_dim d, dw.date_dim e WHERE a.entry_date = CURRENT_DATE AND a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.product_code = d.product_code AND a.order_date >= d.effective_date AND a.order_date <= d.expiry_date AND a.order date = e.date ; END // DELIMITER ; //
  • 41. The above stored procedure will make changes to the sales_order_fact table in dw database.
  • 42. Chapter 6 : Populating the Date Dimension
  • 43. Pre-population: It is the simplest of the three techniques where the dates are inserted for a period of time For e.g. date could be inserted for 5 years between 2009 and 2014 Truncating the date_dim table
  • 44. One Date Everyday: This technique is similar to pre-population technique, but in this technique only one date is pre-populated in a day Daily Date population:
  • 45. Loading dates from the source: The query loads the sales order dates from the sales_order table of the source database into the date_dim table of the DW database.
  • 46. Adding more dates from the additional sales order
  • 47. Chapter 7: Initial Population
  • 48. Initial Population: After identifying the source data, a script is written for initial population Order_dim table:
  • 49. Sales_order_fact table
  • 50. Running the Initial Population Scheme Truncating the sales_order table:
  • 51. Running the Initial Population Scheme Preparing the sales order table
  • 52. Query to confirm whether the sales_order are loaded correctly or not
  • 53. Chapter 8: Regular Population
  • 54. Regular Population Script: In this script, customer_dim and product_dim have been reloaded with data. SCD2 is applied to customer addresses, product names, and product groups. SCD1 is applied to customer names Order_dim table:
  • 55. Product_dim:
  • 56. Product_stg:
  • 57. Sales_order_fact:
  • 58. Customer_dim:
  • 59. Testing Data: The data is tested by running the select query on the sales_order table and the sales_order_fact table
  • 60. Chapter 10: Adding Columns
  • 61. Adding New Columns to the customer dimension: When two new columns are added, the null values would be displayed in the respective rows: Customer_dim table:
  • 62. Customer_stg table:
  • 63. • Adding the order_quantity column in the sales_order •