Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ronalao termpresent

2,573 views

Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Ronalao termpresent

  1. 1. ETL and OLAP Cube Reporting <br />Using the NetFlix OLTP Database<br />By: Rona Charlene Lao<br />
  2. 2. Introduction<br />This project is about building a Data Warehouse database from the Netflix database from the first week’s Assignment.<br />Objectives: <br />To provide an end to end solution to upload transactional data into the Data Warehouse. <br />Provide dynamic reports for NetFlix showing various representations of their aggregated data based on Rental, Shipment, Payment and DVD Inventory.<br />To demonstrate how OLAP is used to provide dynamic multidimensional reports.<br />
  3. 3. Scope<br />To create mock up data to be uploaded into the Data Warehouse<br />To build a complete end to end ETL solution.<br />Use of SQL*Loader, stored procedures and triggers to implement business transformation rules from Staging to Target Area.<br />To create canned reports and demonstrate how Data Warehouses can provide Dynamic multidimensional reports<br />
  4. 4. Out of Scope<br />To build the OLTP database from scratch<br />Code all business and functional rules related to Netflix data storage and operational requirements<br />
  5. 5. Tools and Environment<br />
  6. 6. Process Flow<br />
  7. 7. Process Flow - Extract<br />SQL Queries <br />SQL Queries were ran against the NetFlix OLTP Database to extract the data for the dimension tables. <br />The extracts were saved as CSV Files.<br />SQL*Loader – This tool was used to upload the CSV Files into the Staging Area of the DW database.<br />Stored Procedures – Used to extract data for the Member and DVD dimension tables and for the fact tables.<br />Fact Tables stored procedures have two parameters, startdt and enddt.<br />
  8. 8. Process Flow - Extract<br />Control File<br />SQL*Loader<br />
  9. 9. Process Flow - Transform<br />After the Stored Procedure for the DVD extract executes, the V_DVD materialized view gets refreshed (force)<br />T_STAR_DIM, also gets automatically updated through a trigger once the STG_MOVIEPERSONROLE_DIM table gets populated. <br />The T_STAR_DIM table is a denormalized version of the MOVIEPERSONROLE table<br />T_MEMBER_DIM is also a denormalizedversion of a source table<br />
  10. 10. Process Flow – Load<br />The Stored Procedure, POP_TARGET_SP, moves the data from the Staging Area (STG_) to its corresponding table in the Target Area (T_) within the DW Database.<br />Only takes the records that are not already in the Target Area. <br />Ensures that there is only a subset of data that is run by the process while guaranteeing the preservation of historical data in the Target Fact Tables (T_*_F).<br />Uses NOT IN statements to ensure that there is no duplication <br />Listed in sequence to preserve and abide byintegrity constraints set up in the Target Area. <br />
  11. 11. Database Diagram - NetFlix<br />
  12. 12. Database Diagram - DW<br />
  13. 13. OLAP Cubes and Reporting<br />3 Cubes<br />Rental Cube<br />DVD Cube<br />Payment Cube<br />Reports <br />Dashboard<br />Microsoft Excel – Pivot Tables using Offline Cubes<br />
  14. 14. Rental-DVD Cube<br />This cube is a virtual cube, a combination of the Rental cube and the DVD cube.<br />Rental Cube<br />DVD Cube<br />
  15. 15. Rental-DVD Cube<br />Dimensions and Measures<br />
  16. 16. Rental-DVD Dashboard<br />
  17. 17. Payment Cube<br />Starflake schema<br />Outer join on T_MEMBER_DIM<br />Calculated Measure<br />Example of a Data Warehouse constraint<br />
  18. 18. Payment Cube<br />Dimensions and Measures<br />
  19. 19. Payment Cube Dashboard and Report<br />
  20. 20. Incremental Load<br />Created mock up data<br />Performed CSV extracts<br />Ran SQL*Loader<br />Ran Stored Procedures for the population of the Staging Area<br />Ran Stored Procedure for the population of the Target Area<br />Refreshed Online Cubes<br />Recreated Offline Cubes<br />
  21. 21. Demo<br />Please see the demo.avi file in the ronalao_term.zip file<br />
  22. 22. Sources/References<br />CS779 NetFlix_Oracle_Inserts.sql<br />CS779 Netflix_Oracle_Create_Indexes.sql<br />CS779 NetFlix_Oracle_Create_Tables.sql<br />OLAP Cube 3.0 : http://www.adersoft.com<br />http://msdn.microsoft.com/en-us/library/aa216377(SQL80).aspx<br />http://e-articles.info/e/a/title/Dashboard-Report/<br />http://camstudio.org<br />
  23. 23. Thank you<br />Good luck in the final exams! <br />

×