An Overview Of Data Warehousing And Olap Technology


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An Overview Of Data Warehousing And Olap Technology

  1. 1. An overview of Data Warehousing and OLAP Technology <ul><li>Presented By </li></ul><ul><li>Manish Desai </li></ul>
  2. 2. <ul><li>Introduction </li></ul><ul><li>What is data warehouse ? </li></ul><ul><li>Explanation of definition </li></ul><ul><li>Data warehouse Vs. Operational Database </li></ul><ul><li>Data warehouse architecture </li></ul><ul><li>Back end tools </li></ul><ul><li>Conceptual model </li></ul><ul><li>Database design </li></ul><ul><li>Warehouse servers </li></ul><ul><li>Index structures </li></ul><ul><li>Meta data </li></ul><ul><li>Conclusion </li></ul><ul><li>References </li></ul>
  3. 3. Introduction <ul><li>Essential elements of decision support </li></ul><ul><li>Enables The Knowledge Worker to make better and faster decisions </li></ul><ul><li>Used in many industries like: </li></ul><ul><ul><li>Manufacturing (for order shipment) </li></ul></ul><ul><ul><li>Retail (for inventory management) </li></ul></ul><ul><ul><li>Financial Services (claims and risk analysis) </li></ul></ul><ul><li>Every major database vendor offers product in this area </li></ul>
  4. 4. What is Data Warehouse ? <ul><li>A data warehouse is a “subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making” </li></ul><ul><li>Typically maintained separately from operational databases </li></ul>
  5. 5. Explanation of definition <ul><li>Subject-Oriented: </li></ul><ul><ul><li>Designed around subject such as customer, vendor, product and activity </li></ul></ul><ul><ul><li>Does not includes data that are not needed for Decision support system (DSS) </li></ul></ul><ul><li>Integrated: </li></ul><ul><ul><li>Most important feature </li></ul></ul><ul><ul><li>Consistent naming convention, measurement of variables and so forth </li></ul></ul><ul><ul><li>The data should be stored in single globally acceptable fashion </li></ul></ul>
  6. 6. Explanation (continues…) <ul><li>Time Varying: </li></ul><ul><ul><li>All data in the warehouse should be accurate as of some moment in time </li></ul></ul><ul><ul><li>Data stored over a long time horizon (5 –10 years) </li></ul></ul><ul><ul><li>Key structure contains element of time (implicitly or explicitly) </li></ul></ul><ul><ul><li>Data once correctly recorded cant be updated </li></ul></ul><ul><li>Non Volatile: </li></ul><ul><ul><li>No Update of data allowed </li></ul></ul><ul><ul><li>only loading and access of data operations </li></ul></ul>
  7. 7. Data Warehouse Vs. Operational Database Transaction throughput Query throughout, response metric Short, simple transaction Complex query Unit of work Current, up-to-date, detailed Historical,summarized, multidimensional, integrated Data Day to day operations Decision support Function Clerk, IT professional Knowledge worker user Operational Database Data Warehouse
  8. 8. Architecture <ul><li>Data sourcing,migration,cleanup tools </li></ul><ul><li>Meta data repository </li></ul><ul><li>Data marts </li></ul><ul><li>Data query, reporting, analysis and mining tools </li></ul><ul><li>Data warehouse administration and management </li></ul>
  9. 9. Architecture (continues…) <ul><li>Distributed Data warehouse </li></ul><ul><ul><li>Load balancing, scalability,higher availability </li></ul></ul><ul><ul><li>Meta data replicated and centrally administrated </li></ul></ul><ul><ul><li>Too expansive </li></ul></ul><ul><li>Data marts </li></ul><ul><ul><li>Departmental subset focused on selected subjects </li></ul></ul><ul><ul><li>example: marketing department includes customer, sales and product tabels </li></ul></ul><ul><ul><li>Has own repository and administration </li></ul></ul><ul><ul><li>May lead to complex integration problems if not designed properly </li></ul></ul>
  10. 10. Back end tools and Utilities <ul><li>Data cleaning, loading, refreshing tools </li></ul><ul><li>Cleaning </li></ul><ul><ul><li>Multiple source, possibility of errors </li></ul></ul><ul><ul><li>Example: replace string sex by gender </li></ul></ul><ul><li>Loading </li></ul><ul><ul><li>Building indices, sorting and making access paths </li></ul></ul><ul><ul><li>Large amount of data </li></ul></ul><ul><ul><ul><li>Incremental loading </li></ul></ul></ul><ul><ul><ul><li>Only updated tuples are inserted ,Process hard to manage </li></ul></ul></ul><ul><li>Refresh </li></ul><ul><ul><li>Propagating updates </li></ul></ul><ul><ul><li>When to refresh ? </li></ul></ul><ul><ul><li>Set by administrator depending on user needs and traffic </li></ul></ul>
  11. 11. Conceptual Model and front end tools <ul><li>Multi dimensional view </li></ul><ul><ul><li>Dimensions together uniquely determine the measure </li></ul></ul><ul><ul><li>Example: Sales can be represented as city,product, data </li></ul></ul><ul><ul><li>Each dimension is described by set of attribute </li></ul></ul><ul><ul><li>Example: product consist of </li></ul></ul><ul><ul><ul><li>Category of product </li></ul></ul></ul><ul><ul><ul><li>Industry of product </li></ul></ul></ul><ul><ul><ul><li>Year of introduction </li></ul></ul></ul><ul><li>Front end tools </li></ul><ul><ul><li>Multi dimensional spreadsheet </li></ul></ul><ul><ul><ul><li>Supports Pivoting-reorientation </li></ul></ul></ul><ul><ul><ul><li>Roll_up - summarized data </li></ul></ul></ul><ul><ul><ul><li>Drill_down - go from high level to low level summary </li></ul></ul></ul>
  12. 12. Database design <ul><li>Two ways to represent Multi dimensional model </li></ul><ul><ul><li>Star schema </li></ul></ul><ul><ul><ul><li>Database consist of single fact table and single table for each dimension </li></ul></ul></ul><ul><ul><ul><li>Each tuples in fact table consist of pointer to each of dimension </li></ul></ul></ul><ul><ul><li>Snowflake schema </li></ul></ul><ul><ul><ul><li>Refinement over star schema </li></ul></ul></ul><ul><ul><ul><li>Dimensional hierarchy is explicitly represented by normalizing dimension tables </li></ul></ul></ul>
  13. 13. Warehouse Servers <ul><li>Specialized SQL servers </li></ul><ul><ul><li>Provides advanced query language and query processing support for SQL queries over star and snowflake schemas </li></ul></ul><ul><ul><li>Example: Redbrick </li></ul></ul><ul><li>ROLAP </li></ul><ul><ul><li>Between relational back end and client front end tools </li></ul></ul><ul><ul><li>Extend traditional relational servers to support multidimensional queries </li></ul></ul><ul><ul><li>Example: Microstratergy </li></ul></ul><ul><li>MOLAP </li></ul><ul><ul><li>Multidimensional storage engine </li></ul></ul><ul><ul><li>Direct mapping </li></ul></ul><ul><ul><li>Example: Essbase from Arbor Inc. </li></ul></ul>
  14. 14. Index structures <ul><li>Bit map indices </li></ul><ul><ul><li>Use single bit to indicate specific value of attribute </li></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>instead of storing eight characters to record “engineer” as skill of employee use single bit </li></ul></ul></ul><ul><ul><ul><li>id# Name Skill </li></ul></ul></ul><ul><ul><ul><li>1000 John 1 </li></ul></ul></ul><ul><li>Join indices </li></ul><ul><ul><li>Maintains the relationship between foreign key with its matching primary keys </li></ul></ul>
  15. 15. Meta data and warehouse management <ul><li>Its data about data </li></ul><ul><li>Used for building, maintain, managing and using data warehouse </li></ul><ul><li>Administrative meta data </li></ul><ul><ul><li>Information about setting up and using warehouse </li></ul></ul><ul><li>Business meta data </li></ul><ul><ul><li>Business terms and definition </li></ul></ul><ul><li>Operational meta data </li></ul><ul><ul><li>Information collected during operation of warehouse </li></ul></ul>
  16. 16. Conclusion <ul><li>Data warehouse is the technology for the future. </li></ul><ul><li>data warehouse enables knowledge worker to make faster and better decisions </li></ul>
  17. 17. References <ul><li>Inmon W. H.,Building the data warehouse </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>Kimball, R. The data warehouse toolkit. </li></ul>