Kushal Data Warehousing PPT

764 views
571 views

Published on

In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
764
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Let us look at a transaction which transfer money from one account to another. The transaction has to do two updates. But this should be transparent to the end user. To the user either the transfer goes thru or it doesn’t.
    Before and after the transaction the database should be in a consistent state
    Each transaction should be made to feel that it is the only transaction executing at that instant
    After the transaction completes the changes made to the db should be visible to other transactions
  • Kushal Data Warehousing PPT

    1. 1. BY Kushal Singh Acute Informatics Pvt
    2. 2. What is Business Intelligence? BI is an abbreviation of the two words      Business Intelligence, bringing the right  information at the right time to the right  people in the right format.
    3. 3. What is Data Warehousing? Data Warehouse is a subject-oriented, integrated, nonvolatile and timevariant collection of data in support of management’s decisions.
    4. 4. What is Business Intelligence?
    5. 5.  The architecture  Operational data source1 High summarized data Meta-data Operational data source 2 Reporting, query, application development, and EIS(executive information system) tools Query Manage Lightly summarized data Load Manager Operational data source n Operational data store (ods) DBMS Detailed data OLAP(online analytical processing) tools Warehouse Manager Operational data store (ODS) Data mining Archive/backup data Typical architecture of a data warehouse End-user access tools
    6. 6.  The benefits of data warehousing • The potential benefits of data warehousing are high returns on investment.. • substantial competitive advantage.. • increased productivity of corporate decision-makers..
    7. 7. Data Warehouse Characteristics  Key Characteristics of a Data Warehouse  Subject-oriented  Integrated  Time-variant  Non-volatile 8
    8. 8. Subject Oriented • Example for an insurance company : Applications Area Data Warehouse Auto and Fire Auto and Fire Policy Policy Processing Processing Systems Systems Commercial Commercial and Life and Life Insurance Insurance Systems Systems Data Data Accounting Accounting System System Billing Billing System System Policy Policy Customer Customer Claims Claims Processing Processing System System Losses Losses Premium Premium 9
    9. 9. Integrated • Data is stored once in a single integrated location (e.g. insurance company) Auto Policy Auto Policy Processing Processing System System Customer data stored in several databases Data Warehouse Database Fire Policy Fire Policy Processing Processing System System FACTS, LIFE FACTS, LIFE Commercial, Accounting Commercial, Accounting Applications Applications Subject = Customer 10
    10. 10. Time - Variant Data is stored as a series of snapshots or views which record how it is collected across time. Data Warehouse Data Time Data   { • Key   Data is tagged with some element of time -  creation date, as of  date, etc. Data is available on-line for long periods of time for trend  analysis and forecasting. For example, five or more years 11
    11. 11. Non-Volatile • Existing data in the warehouse is not overwritten or updated. External Sources Production Databases Data Data Warehouse Warehouse Environment Environment Production Production Applications Applications • Update • Insert • Delete Data Warehouse Database • Load • Read-Only 12
    12. 12. Comparision of OLTP systems and data warehousing system OLTP systems Hold current data Stores detailed data Data is dynamic Repetitive processing High level of transaction throughput Predictable pattern of usage Transaction-driven Application-orented Supports day-to-day decisions Serves large number of clerical/operation users Data warehousing systems Holds historical data Stores detailed, lightly, and highly summarized data Data is largely static Ad hoc, unstructured, and heuristic processing Medium to how level of transaction throughput Unpredictable pattern of usage Analysis driven Subject-oriented supports strategic decisions Serves relatively how number of managerial users
    13. 13. OLTP Online Transaction Processing
    14. 14. On Line Transaction Processing • What is a Transaction ? – A Logical unit of work – – – Examples: Drawing Money from a bank account Booking a seat on an airline
    15. 15. Transactions • It is a unit of program execution that accesses & possibly updates various data items. • A transaction is a logical unit of work that performs some useful function for a user. • In end of the transaction the system must be: – in the prior state (if the transaction fails) or – the status of the system should reflect the successful completion (if the transaction succeeded). • May take a database from one consistent
    16. 16. Characteristics of Transactions A tomicity C onsistency I solation D urability
    17. 17. OLAP Online Analytical Processing
    18. 18. Types of OLAP • ROLAP (Relational Online Analytical Processing) • MOLAP (Multidimensional Online Analytical Processing) • HOLAP (Hybrid Online Analytical Processing)
    19. 19. ROLAP • ROLAP (Relational online analytical Processing) • Used for reporting • Tools: Report studio
    20. 20. MOLAP • MOLAP (Multidimensional online Analytical processing) • Used to build cubes • Tools: Powerplay, Transformer
    21. 21. HOLAP • HOLAP (Hybrid online analytical Processing) • Used for Data modeling • This will support both MOLAP and ROLAP • Tools: Framework manager, Query Studio.
    22. 22. Dimensions • It’s descriptive information about a measures like product, location, customer etc.
    23. 23. Types of Dimensions • Confirmed Dimensions • Degenerated Dimensions • Junk Dimensions
    24. 24. Facts • Fact is containing measures and IDs. • Ex; Revenue, Cost, Amount etc
    25. 25. Measure Types • Additive Measures: Which can be added across all the dimensions • Non Additive Measures: Which can not be added across all the dimensions • Semi Additive Measures: Which can be added across some dimensions and which can not be added across some other dimensions
    26. 26. Schema’s In Data warehousing • • • STAR SHEMA SNOW-FLAKE SCHEMA STAR-FLAKE SCHEMA
    27. 27. Star Schema Dimension Tables Region_Dimension_Table region _id NE NW SE SW Product_Dimension_Table prod_grp_id prod_id prod_grp_desc prod_desc 10 20 30 100 140 220 Fewer devices Circuit boards Components region _doc Northeast Northwest Southeast Southwest account _id Power supply Motherboard Co-processor 100000 110000 120000 130000 140000 account _doc ABC Electronics Midway Electric Victor Components Washburn, Inc. Zerox Account_Dimension_Table month prod_id region_id account_id vend_id net-sales gross_sales 01-1996 02-1996 03-1996 100 140 220 SW NE SW 100000 110000 100000 100 200 300 30,000 23,000 32,000 50,000 42,000 49,000 Fact Table Monthly_Sales_Summary_Table month 01-1996 02-1996 03-1996 mo_in_fiscal_yr 4 5 6 month_name January February March Time_Dimension_Table Vendor_Dimension_Table vend_id 100 200 300 vendor_desc PowerAge, Inc. Advanced Micro Devices Farad Incorporated 28
    28. 28. SNOW-FLAKE SCHEMA
    29. 29. Factless Fact Table • It’s just a bridge between table where we used to join tables. • In this scenario we can only track the event.
    30. 30. SCD (Slowly Changing Dimensions) • • • • TYPE 0 TYPE 1 TYPE 2 TYPE 3
    31. 31. ETL (Extract, Transform and Loading) INFORMATICA
    32. 32. Designing FRAMEWORK MANAGER Relational Database & DMR
    33. 33. REPORTING IBM COGNOS

    ×