Raugh kimball – In simplest terms Data Warehouse can be defined as collection of Data marts. -Data marts : Subjective collection of Data. Bill Inmon – A data warehouse is a “subject-oriented, integrated, timevariant,and nonvolatile” collection of data in support of management’s decision-making process. ” <ul><li>ERP </li></ul><ul><li>will Run the Business </li></ul><ul><li>- like how Tyres Run the Car </li></ul><ul><li>BI (Reports,Data mining,Dashboards,kpi’s) </li></ul><ul><li>will help you to take business decisions based on your historical data. </li></ul><ul><li>- like Steering, mirrors, breaks, dashboards will help, how smoothly you can run the Car or reach the Destination. </li></ul>
In What way a Data warehouse helps any Business Let’s say A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
Data, Data everywhere yet ... <ul><li>I can’t find the data I need </li></ul><ul><ul><li>data is scattered over the network </li></ul></ul><ul><ul><li>many versions, subtle differences </li></ul></ul><ul><li>I can’t get the data I need </li></ul><ul><ul><li>need an expert to get the data </li></ul></ul><ul><li>I can’t understand the data I found </li></ul><ul><ul><li>available data poorly documented </li></ul></ul><ul><li>I can’t use the data I found </li></ul><ul><ul><li>results are unexpected </li></ul></ul><ul><ul><li>data needs to be transformed from one form to other </li></ul></ul>
A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
What are the users saying... <ul><li>Data should be integrated across the enterprise </li></ul><ul><li>Summary data has a real value to the organization </li></ul><ul><li>Historical data holds the key to understanding data over time </li></ul><ul><li>What-if capabilities are required </li></ul>
A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] Data Information
Data Warehousing -- It is a process <ul><li>Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible </li></ul><ul><li>A decision support database maintained separately from the organization’s operational database </li></ul>
Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
We want to know ... <ul><li>Given a database of 100,000 names, which persons are the least likely to default on their credit cards? </li></ul><ul><li>Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? </li></ul><ul><li>If I raise the price of my product by Rs. 2, what is the effect on my ROI? </li></ul><ul><li>If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result? </li></ul><ul><li>If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? </li></ul><ul><li>Which of my customers are likely to be the most loyal? </li></ul>Data Mining helps to extract such information
Base Product $ 25K $ 40K $ 25K Oracle 10g IBM DB2
Base Product Manageability (included) $ 25K $ 154.5K $ 164.5K $ 232K $ 116K Business Intelligence High Availability Data Guard $116K Recovery Expert $10k
Base Product Manageability (included) High Availability Business Intelligence Multi-core $348k - $464k $ 232K $ 25K $ 164.5K $ 329K $164.5K $116K - $232K
What happened? Why did it happen? What will happen? What happened why and how? Additional Benefit Number of Users
OLTP – Online Transaction Processing OLAP – Online Analytical Processing MOLAP – Multidimensional OLAP ROLAP – Relational OLAP HOLAP – Hybrid OALP Dimensions – De-normalized master tables Attributes – Columns of Dimensions Hierarchies – sequential order of attributes Facts (Measure group) – Transactions tables in DWH Fact (Measures) Cubes – Multidimensional storage of Data KPI’s – Key performance indicator Dashboards – combination of reports,kpis,charts Data Marts – Subjective Collection of Data SCD’s – Slowly changing Dimensions Perspectives – Child Cube
Operational Data Sources Data-Migration Middleware (Populations-Tools) Data Storage Repository Data Analysis Reporting, OLAP, Data Mining
Stage DB Optional ROLAP OLTP MOLAP O L A P SSIS Integration Services Analysis Services Reporting Services SSAS SSRS SSIS Data Marts CUBE
1. OLTP (on-line transaction processing) 2. Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. 1. OLAP (on-line analytical processing) 2. Data analysis and decision making 3. The tables are in the Normalized form. 3. The tables are in the De-Normalized form. 5. For Designing OLTP we used data modeling. 5. For Designing OLTP we used Dimension modeling. OLAP is classified into two i.e., MOLAP & ROLAP 4. We Called the Storage objects as Tables. i.e., All the masters and the Transactions are stored in the tables. 4. We Called the Storage objects as Dimension and Facts. i.e., All the masters Are dimension and the Transactions are Facts.
Topics Later We will Cover 2. Slowly changing Dimensions 1. Types of Dimensions 3. Hierarchies Normalized Tables De-Normalized Tables Product Prod_Id Prod_Name Base_Rate Cat_Id Category Cat_Id Cat_Name Cat_Desc Group_Id Group Group_Id Group_Name Group_Desc Product_Dim Prod_Id Prod_Name Base_Rate Cat_Name Cat_Desc Group_Name Group_Desc
Qty*Unit_Price+Tax=Total Amount Usually calculate all the calculations before storing into OLAP Reference keys of Dimensions Numeric fields called as Fact or measure SalesOrder_Fact Cust_Id Prod_Id Order_Date Delivery_Date Unit_Price Qty Total_Amount Tax SalesOrderDetails Cust_Id SalesPerson Prod_Id Order_Date Booked_Date Delivery_Date Unit_Price Qty Tax Created_By
STAR Schema Prod_Dim Prod_Id ……… Cust_Dim Cust_Id ……… Time_Dim Date Year Month ……… Org_Dim Org_Id ……… SalesOrder_Fact Cust_Id Prod_Id Order_Date Delivery_Date Org_Id Unit_Price Qty Total_Amount Tax
1. Dimensions will have only relation with the Fact. (Normalized model) 1. Dimension will have a relation other than Fact. (De-Normalized model) 2. One to many or One to One relation will Occur. 2. Used for many to many relation. 3. Performance is fast but required huge storage space. 3. Performance is Low but required Less storage space.