Data Warehousing

779 views

Published on

June 10, 2010 BDPA Charlotte Program Meeting Presentation.

Presenter:
Markus Beamer, BDPA Charlotte President Elect

Topic:
Intelligent Data Strategies - Intro to Data Marts and Data Warehouses

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
779
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Warehousing

  1. 1. AN INTRO TO DATA MARTS AND DATA WAREHOUSES MARKUS BEAMER BDPA-CHARLOTTE WWW.BDPA-CHARLOTTE.ORG ALSO AVAILABLE ON MOBEAMER.BLOGGER.COM Intelligent Data Strategies
  2. 2. Data Warehouse or Data mart <ul><li>There are many definitions of a data warehouse and data marts. However there is no single standard definition. However for our purposes we will define them as follows: </li></ul>Data Warehouse Data Mart Extreme Volume Contains years of daily information at the lowest grain possible. Specific Volume Sets May only contain month to date information. Corporate wide The grouping of data elements is dictated by the corporate structure. Specific Data is grouped by needs of the team or group building the solution. Facts and Dimensions This system is typically made up of facts. Many Metrics Has many metric tables and rollup grains Serves data to Datamarts Serves data to Reports Will have data that is shared across groups. Will have data specific to only the implementation group.
  3. 3. Traditional Reporting Solutions Many systems performing many different business functions Analyst(s) Intelligence Many reports from the multiple sources Human intervention is needed to “makes sense” of different reports. <ul><li>Opens the door for: </li></ul><ul><ul><li>Conflicting numbers </li></ul></ul><ul><ul><li>Human Error </li></ul></ul><ul><ul><li>Miss-understood data </li></ul></ul><ul><ul><li>Non-Efficent </li></ul></ul>
  4. 4. The Disorganized Closet <ul><li>Like a Disorganized Closet </li></ul><ul><li>The Data is there, but do you know that you have that special shirt you really need on Friday </li></ul>
  5. 5. Organize Your Data Closet <ul><li>It takes time </li></ul><ul><li>It takes discipline </li></ul><ul><li>It takes a planned approach </li></ul>
  6. 6. Simple Data Warehousing Many systems performing many different business functions Intelligence A centralized shared location for all the data. Automated reporting that understands all sources. Reports specific to each system can still be delivered.
  7. 7. Adding Data Marts Intelligence Automated reporting that understands all sources can be delivered. Reports specific to each system can still be delivered. Data Mart Data Mart
  8. 8. Loading the Warehouse <ul><li>Extract, Transform, Load (ETL) is a process used to get data into your warehouse. The typical chain of events is as follows: </li></ul><ul><li>Your front end system, usually a transactional system, will send data and information to it’s relational database. </li></ul><ul><li>At certain periods, nightly, hourly or otherwise, a single data file is extracted and delivered to a specified location. </li></ul><ul><li>The warehouse on detecting a new file will transform this information and load it into it’s standard model. </li></ul>The Website A normal website where customer can come and order items from your company. Data A single data file containing All Orders made by a customer for that day. Website DB This is your standard relational database system. Tracks a lot of information. Ware House All this data is stored in a single table called “Orders”
  9. 9. Components of a Datawarehouse External Sources Stage Data Marts Reports
  10. 10. Facts <ul><li>Source – this record is a one to one match of data delivered from the data file in the ETL process. </li></ul><ul><li>Fact - A fact is a single measureable data piece. Fact tables will not typically contain text fields. They will also always have a date associated with them. This represents when that fact was taken. </li></ul>Date CustomerDimID ProductDimID Price 06/25/2010 1 101 19.00 06/28/2010 2 102 10.00 Name CustomerID Product Price Date Markus 1001 Airplane 19.00 06/25/2010 John 1010 Car 10.00 06/28/2010
  11. 11. Dimensions <ul><li>Dimension – A dimension is an attribute found within the source data. In a perfect warehouse all text elements would be turned into dimensions. You may even do this to numeric values. Dimension will typically speed up your reporting processes. </li></ul><ul><li>A Conformed Dimension table is a dimension table that is shared throughout all of your data marts within your warehouse. For example: A customer, product or employee dimension might be considered a core dimension. </li></ul>CustDimID CustID Name 1 1001 Markus 2 1010 John ProductDimID Name 101 Airplane 102 Car
  12. 12. Time Sensitive Dimensions <ul><li>Slow Changing Dimensions allow for time sensitive data tracking </li></ul><ul><li>Simply add start and end dates to each dimension table. </li></ul><ul><li>This will impact your loading and transformation processes. </li></ul>2 Customers one in NC one in SC. Markus moves to SC in Feb. You can still report accurate NC sales in Jan because of the start and end dates. CustDimID CustID Name State Start End 1 1001 Markus NC 01/01/2010 01/01/2070 2 1010 John SC 01/01/2010 01/01/2070 CustDimID CustID Name State Start End 1 1001 Markus NC 01/01/2010 01/31/2010 2 1010 John SC 01/01/2010 01/01/2070 3 1001 Markus SC 02/01/2010 01/01/2070
  13. 13. Hierarchies <ul><li>Hierarchies are dimension tables that reflect parent to children relationships. Typically a hierarchy table will be used to “rollup” metrics to different levels. </li></ul><ul><li>We can turn the product dimension table into a hierarchy by adding a parent product code. </li></ul>For Example: The Airplane and Car both belong to the Toys product line. This hierarchy could be used to rollup and produce all Toy sales. ProductDimID Name ProductCode ParentCode 100 Toys T - 101 Airplane A1 T 102 Car C1 T
  14. 14. Facts and Dimensions <ul><li>Notice how this fact tables has relations to the dimension tables. This allows us to “pivot” the facts around each dimension in an efficient manner. </li></ul>CustDimID CustID Name 1 1001 Markus 2 1010 John ProductDimID Name 101 Airplane 102 Car Date CustomerDimID ProductDimID Price 06/25/2010 1 101 19.00 06/28/2010 2 102 10.00
  15. 15. Metrics <ul><li>Metric – A metric is an aggregation of fact information, usually around a particular set of dimensions. In a typical environment a metric table becomes the source for a single report. But because of the dimensions, metrics can be combined across multiple systems. </li></ul><ul><li>These metric tables are combined with their dimensions to produce that actual output of the reports. </li></ul>Product Metric Table For Example: The Product Metric table might be used to show that “Airplanes”(101) were sold 30 times on the 28 th . Customer Metric Table In this example you can see that Markus (1) bought one product, while John (2) bought 5 orders . Date ProductDimID NumOrders 06/25/2010 101 20 06/28/2010 102 30 Date CustDimID NumOrders 06/25/2010 1 1 06/28/2010 2 5
  16. 16. Star Schema <ul><li>Star Schema – The diagram used to depict a traditional data mart is a called a star schema. Typically a fact or metric tables is placed in the center. All dimension tables are then laid out around it. Giving the diagram a star like appearance. </li></ul>Orders Date CustomerDimID ProductDimID EmployeeDimID Price Product ProductDimID ProductName ProdcutCode ParentCode Start End Customer CustomerDimID CustID Name Age Start End Date Date Day of Week Month Name isHoliday Employee EmployeeDimID SSN Manager SSN Status Start End
  17. 17. Using the Data Mart <ul><li>You can then take these fact and dimension tables and place them in front of a reporting engine. </li></ul><ul><li>The user can then drill through the metrics. </li></ul><ul><li>The dimension tables allow the user to “pivot” the metrics through any attribute. They can go from viewing Customers by State to viewing Sales by Employees by switching dimensions. </li></ul><ul><li>If users consistently want to use one view of the data, you may decide to turn these into Metric tables. </li></ul>
  18. 18. <ul><li>SQL </li></ul><ul><li>Schema </li></ul><ul><li>Creating Metric Table </li></ul><ul><li>Select Orders.date </li></ul><ul><li>,Orders.CustomerDimID </li></ul><ul><li>,count(*) as numOrders </li></ul><ul><li>From Orders </li></ul><ul><li>Group by orders.CustomerDimID </li></ul><ul><li>Reporting SQL </li></ul><ul><li>Select date, numOrders, Customer.Name </li></ul><ul><li>from Metric_NumOrders </li></ul><ul><li>inner join customer </li></ul><ul><li>on customer.customerDimID = Metric.customerDimID </li></ul><ul><li>where date between ‘01/01/2010’ and ‘01/31/2010’ </li></ul>Creating A Metric Table Orders Date CustomerDimID ProductDimID EmployeeDimID Price Product ProductDimID ProductName ProdcutCode ParentCode Start End Customer CustomerDimID CustID Name Age Start End Date Date Day of Week Month Name isHoliday Employee EmployeeDimID SSN Manager SSN Status Start End Date CustDimID NumOrders 06/25/2010 1 1 06/28/2010 2 5
  19. 19. References <ul><li>A data mart is not a data warehouse </li></ul><ul><ul><li>http://www.information-management.com/infodirect/19991120/1675-1.html </li></ul></ul><ul><li>General Data Warehousing Articles </li></ul><ul><ul><li>http://www.ralphkimball.com/html/articles.html </li></ul></ul>

×