Introduction to data warehousing


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to data warehousing

  2. 2. What is data warehousing? data warehouse is a database used for reporting and analysis Integrated collection of ENTERPRISE-WIDE DATA, oriented to decision making Provides strategic information Performing Information analysis that could not done by operating system
  3. 3. Need for data warehousingMaintain data historyEven if the source transaction systems do not. Integrate data from multiple source systems,Improve data quality by providing consistent codes and descriptionsProvides a flexible, conducive and interactive source of strategic informationPerforming Information analysis that could not done by operating system
  4. 4. Data Rich, but Information Poor• Data is stored, not explored : by its volume and complexity it represents a burden, not a support• Data overload results in uninformed decisions, contradictory information, higher overhead, wrong decisions, increased costs• Data is not designed and is not structured for successful management decision making
  5. 5. Improving Decision Making Decisions DataInformation Warehouse Data 5
  6. 6. Operational data storesData focuses on transaction functions such as bank card withdrawals and depositsIt is organised by application ODS It contains the current valuesIt supports day-to-day operational decision supports information it is detailed , nonredundant and updateable
  7. 7. Informational data stores Itis organised around subject such as customer, productIt is summarized, archived, derivedData is static until refreshedData is nonupdateable
  8. 8. Difference between operational &informational data stores Operational Informational Data dataData content Current value Summarized, archived, derivedData organization By application By subjectData stability Dynamic Static until refreshedData structure Optimized for transaction Optimized for complex QueriesAccess frequency High Medium to lowAccess type Read/update/delete Read/aggregate Field by field Added toResponse time Subsecond(<1s) to2-3s Several second to minute
  9. 9. Data warehousing is defined as A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decision A data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applications.
  10. 10. Data Warehouse CharacteristicsIt is database designed for analytical tasks, using data from multiple applicationIt supports a relatively small numbers of users with relatively long interactionIts content is periodically updatedIt contains current and historical data to provide a historical perspective of informationIt contains a few large tables
  11. 11. Integrated • Data is stored once in a single integrated location (e.g. insurance company) Auto Policy Processing Data Warehouse System DatabaseCustomer Fire Policydata Processingstored Systemin severaldatabases Subject = Customer FACTS, LIFE Commercial, Accounting Applications 12
  12. 12. Time - Variant• Data is stored as a series of snapshots or views which record how it is collected across time. Data Warehouse Data Time Data { Key  Data is tagged with some element of time - creation date, as of date, etc.  Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years 13
  13. 13. Non-Volatile• Existing data in the warehouse is not overwritten or updated. External Sources Production Data Databases Warehouse Data DatabaseProduction WarehouseApplications Environment • Load • Update • Insert • Read-Only • Delete 14
  14. 14. Subject Oriented • Example for an insurance company :Applications Area Data Warehouse Auto and Fire Policy Commercial Processing Customer Policy and Life Systems Insurance Systems Data Data Claims Losses Premium Accounting Processing System Billing System System 15
  15. 15. Data Warehouse ArchitectureIt is based on a relational database management system server that function as the central repository for informational data
  16. 16. Operational System Data Warehouse Ad-hoc Reporting Conversion & Interface OLAP Cubes Canned Reports ODS Staging Area Data Marts 17
  17. 17. Data Warehouse ArchitectureThe source data for it is operational applicationDuring processing data is transformed into an integrated structure and formatThe transformation process may involve conversion, summarization, filtering and condensation of data
  18. 18. References:Introduction to data warehousing ehousingppt arehousing.html talk-impact.ppt