Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Provisioning & Optimization

3,771 views

Published on

Published in: Technology
  • Be the first to comment

Data Provisioning & Optimization

  1. 1. Data Provisioning An Enterprise and Business Perspective
  2. 2. Problem Statement…. “ There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.” Quote by CIO of Major Corporation
  3. 3. The main dimensions of BI : Reporting (Operational) <ul><li>Access and Format data from multiple disparate sources </li></ul><ul><ul><li>DB2, Oracle, Sybase or SalesForce.com </li></ul></ul><ul><li>Provides Holistic view of business </li></ul><ul><ul><li>New customer acquisition at Actuate touches: </li></ul></ul><ul><ul><ul><li>Billing </li></ul></ul></ul><ul><ul><ul><li>Licensing </li></ul></ul></ul><ul><ul><ul><li>Support </li></ul></ul></ul><ul><ul><ul><li>Shipping </li></ul></ul></ul><ul><li>Inherently Semantic in nature </li></ul><ul><ul><li>Customers, Order history, Balance Payment, Products </li></ul></ul>
  4. 4. The main dimensions of BI : Analysis (tactical) <ul><li>Need to view data in “dimensions” and “hierarchies” </li></ul><ul><ul><li>Sales (d) by Country (h), by Time period (h) </li></ul></ul><ul><li>Explore and Analyze </li></ul><ul><ul><li>Ad-hoc reporting </li></ul></ul><ul><ul><li>Pivot data </li></ul></ul><ul><ul><li>Drill down to details </li></ul></ul><ul><li>Highly summarized and concise view of data </li></ul><ul><li>Technology & Interfaces optimized for high performance </li></ul>
  5. 5. The main dimensions of BI : Dashboards & Scorecards (strategic) <ul><li>Monitor KPIs,Metrics & Exceptions </li></ul><ul><li>Extensive use of Alerts and Tracking mechanisms </li></ul><ul><li>Data emphasized with graphs and visual controls </li></ul>
  6. 6. What do end users really care about? <ul><li>Easier access to data </li></ul><ul><li>Quick/Reasonable Performance </li></ul>
  7. 7. Report developers worst Nightmare……
  8. 8. Relational Rules <ul><li>Operational Applications </li></ul><ul><ul><li>Relational Database </li></ul></ul><ul><ul><li>Highly Normalized (3NF = Ideal) </li></ul></ul><ul><ul><li>Emphasis on Keys, Relationships, Joins </li></ul></ul><ul><li>OLTP Database </li></ul><ul><ul><li>Designed to support Applications </li></ul></ul><ul><ul><li>Ideal design and model for: </li></ul></ul><ul><ul><ul><li>Lots of Users, Small slices of data </li></ul></ul></ul><ul><ul><ul><li>E.g. Debit a/c# 1001, with amount $1000.00, for withdrawal from ATM Thornall Street </li></ul></ul></ul><ul><ul><li>Very Bad model for: </li></ul></ul><ul><ul><ul><li>Small # of users and large slices of data </li></ul></ul></ul><ul><ul><ul><li>Sums, Aggregations, Calculations & Complex Business logic. </li></ul></ul></ul>
  9. 9. Relational Rules, slightly bent <ul><li>Information Applications </li></ul><ul><ul><li>Highly De-normalized </li></ul></ul><ul><ul><li>Dimensioned </li></ul></ul><ul><ul><li>ODS or Data Warehouse </li></ul></ul><ul><li>OLAP </li></ul><ul><ul><li>Recommended model for: </li></ul></ul><ul><ul><ul><li>Large # of users & Small slices of data </li></ul></ul></ul><ul><ul><ul><li>Aggregates, pre-calculated values and ability to slice and dice data in multiple ways (e.g. Actuate Software sales by region, country, sales rep.) </li></ul></ul></ul><ul><ul><ul><li>E.g. What is the proportions of ATM withdrawals that occur within the person’s primary address? </li></ul></ul></ul><ul><ul><li>Very Bad model for: </li></ul></ul><ul><ul><ul><li>Running Operation applications </li></ul></ul></ul><ul><ul><ul><li>E.g. Niku, PeopleSoft Finance </li></ul></ul></ul>
  10. 10. Dimensional Modeling: Star Schema
  11. 11. Recognizing ‘Facts’ and ‘Dimensions’ <ul><li>Facts </li></ul><ul><ul><li>Has what you are trying to MEASURE (e.g. Sales, Expenditures) </li></ul></ul><ul><ul><li>Usually Numeric's </li></ul></ul><ul><li>Dimensions </li></ul><ul><ul><li>Qualify’s Measures (e.g. Products, Departments, Time) </li></ul></ul><ul><ul><li>Hierarchical in nature (NAO  East  West  Central) </li></ul></ul>
  12. 12. Extracting requirements to build the data model <ul><li>Business users drive processes and projects </li></ul><ul><ul><li>Do not ask precise questions such as “what numbers do you want?” </li></ul></ul><ul><ul><ul><li>They want everything in a report or flat file so they can do their own analysis </li></ul></ul></ul><ul><ul><ul><li>Usually a huge gap exists between what they WANT and what they NEED </li></ul></ul></ul><ul><ul><li>Have them express in Analytical terms </li></ul></ul><ul><ul><ul><li>I would like to know what is the distribution of Actuate Software Sales by Product type and Region. </li></ul></ul></ul><ul><ul><ul><li>What is the proportion of Revenue that comes from ‘A (new)’ versus ‘B (existing)’ customers </li></ul></ul></ul><ul><ul><li>Your Objective? </li></ul></ul><ul><ul><ul><li>Is to identify the ‘measures’ and ‘dimensions’ needed for your data model </li></ul></ul></ul><ul><ul><ul><li>I would like to know what is the distribution of Actuate Software Sales by Product type and Region </li></ul></ul></ul><ul><ul><ul><li>What is the proportion of Revenue that comes from ‘A (new)’ versus ‘B (existing)’ customers </li></ul></ul></ul>
  13. 13. Data Provisioning Life-Cycle
  14. 14. Different approaches to Data Warehouse <ul><li>Requires Vision, Patience, Sponsorship and Money!! </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>High data latency </li></ul><ul><li>High ETL, App. and DBA cost </li></ul><ul><li>Suitable for low volumes of data </li></ul><ul><li>Network Bandwidth and Join complexity issues </li></ul><ul><li>Difficult to manage Metadata </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>Redundant and replicated data </li></ul><ul><li>High ETL, App. and DBA cost </li></ul>CONS <ul><li>Provides Enterprise Business view of data </li></ul><ul><li>Consistent design and High Data Quality </li></ul><ul><li>Supports ease of customization and reports development </li></ul><ul><li>No need for ETL </li></ul><ul><li>No need for another platform/database </li></ul><ul><li>Easy to get Buy-in </li></ul><ul><li>Easy to Build </li></ul>PROS Centralized, Integrated Data with Direct access EDW Dependant Data Marts Leave Data where it is Independent Data Marts
  15. 15. Different approaches to Data Warehouse <ul><li>Requires Vision, Patience, Sponsorship and Money!! </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>High data latency </li></ul><ul><li>High ETL, App. and DBA cost </li></ul><ul><li>Suitable for low volumes of data </li></ul><ul><li>Network Bandwidth and Join complexity issues </li></ul><ul><li>Difficult to manage Metadata </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>Redundant and replicated data </li></ul><ul><li>High ETL, App. and DBA cost </li></ul>CONS <ul><li>Provides Enterprise Business view of data </li></ul><ul><li>Consistent design and High Data Quality </li></ul><ul><li>Supports ease of customization and reports development </li></ul><ul><li>No need for ETL </li></ul><ul><li>No need for another platform/database </li></ul><ul><li>Easy to get Buy-in </li></ul><ul><li>Easy to Build </li></ul>PROS Centralized, Integrated Data with Direct access EDW Dependant Data Marts Leave Data where it is Independent Data Marts
  16. 16. Different approaches to Data Warehouse <ul><li>Requires Vision, Patience, Sponsorship and Money!! </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>High data latency </li></ul><ul><li>High ETL, App. and DBA cost </li></ul><ul><li>Suitable for low volumes of data </li></ul><ul><li>Network Bandwidth and Join complexity issues </li></ul><ul><li>Difficult to manage Metadata </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>Redundant and replicated data </li></ul><ul><li>High ETL, App. and DBA cost </li></ul>CONS <ul><li>Provides Enterprise Business view of data </li></ul><ul><li>Consistent design and High Data Quality </li></ul><ul><li>Supports ease of customization and reports development </li></ul><ul><li>No need for ETL </li></ul><ul><li>No need for another platform/database </li></ul><ul><li>Easy to get Buy-in </li></ul><ul><li>Easy to Build </li></ul>PROS Centralized, Integrated Data with Direct access EDW Dependant Data Marts Leave Data where it is Independent Data Marts
  17. 17. Different approaches to Data Warehouse <ul><li>Requires Vision, Patience, Sponsorship and Money!! </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>High data latency </li></ul><ul><li>High ETL, App. and DBA cost </li></ul><ul><li>Suitable for low volumes of data </li></ul><ul><li>Network Bandwidth and Join complexity issues </li></ul><ul><li>Difficult to manage Metadata </li></ul><ul><li>Enterprise View Unavailable </li></ul><ul><li>Redundant and replicated data </li></ul><ul><li>High ETL, App. and DBA cost </li></ul>CONS <ul><li>Provides Enterprise Business view of data </li></ul><ul><li>Consistent design and High Data Quality </li></ul><ul><li>Supports ease of customization and reports development </li></ul><ul><li>No need for ETL </li></ul><ul><li>No need for another platform/database </li></ul><ul><li>Easy to get Buy-in </li></ul><ul><li>Easy to Build </li></ul>PROS Centralized, Integrated Data with Direct access EDW Dependant Data Marts Leave Data where it is Independent Data Marts
  18. 18. Actuate Information Objects as Middleware <ul><li>Integrates Data from multiple sources </li></ul><ul><li>Presents data in Business friendly terms </li></ul><ul><li>Suitable for small to medium volumes of data </li></ul>
  19. 19. Information Pyramid: What, When & Where? 0 1 2 3 4 5
  20. 20. How companies end up with 1000’s of reports <ul><li>Client with International operations, multiple systems and many databases created to support local ad-hoc and transaction management. </li></ul><ul><li>Unable to consolidate data to satisfy reporting requirements. </li></ul><ul><li>Decides to build a Data Warehouse. </li></ul><ul><li>After many iterations, DW is finally complete with information from many subject areas and provides significant strategic business value. </li></ul><ul><li>The new DW satisfied many of the standard and static reporting needs (reports once created, did not need to be modified). </li></ul><ul><li>However as the # of users of the DW grew, reporting requirements became more complex in nature. </li></ul><ul><li>The client soon realized that the reports became more and more dynamic in nature. </li></ul>
  21. 21. How companies end up with 1000’s of reports…contd. <ul><li>The growth in # of reports was due to the fact that users now had access to data and could view and analyze it in multiple ways and thus emerged the need for Ad-hoc reporting. </li></ul><ul><li>The increase in # of reports and the increase in volume of data being requested by them had a very drastic impact on performance. </li></ul><ul><li>Upon analysis the client found that the data warehouse was in a 3NF and followed an E/R model. </li></ul><ul><li>To solve the performance problem the client created a new set of de-normalized tables in the DW. </li></ul><ul><li>The process of de-normalizing tables was done on a report-by-report basis. </li></ul>
  22. 22. How companies end up with 1000’s of reports…contd. <ul><li>Very soon the client ended up with a DW and a few thousand de-normalized tables. </li></ul><ul><li>These de-normalized tables were supposed to provide easy and fast access to data. </li></ul><ul><li>However in reality, these tables were very complex to manage, had inconsistent data and difficult to link. </li></ul><ul><li>This made the DW very inefficient to maintain and use for reporting. </li></ul><ul><li>The client realized the problems of creating de-normalized tables for every report and decided to adopt another strategy. </li></ul>
  23. 23. How companies end up with 1000’s of reports…contd. <ul><li>The client then decided to undertake a new project to maintain the reporting tables or data structures in a separate, ‘dimensioned’ database or ‘data mart’. </li></ul><ul><li>The new dimensioned model would not only help users with ad-hoc analysis, but also be easier to read and understand. </li></ul><ul><li>In addition, the dimensioned model would give much better performance than the E/R modeled DW </li></ul>
  24. 24. What would you have done differently?

×