Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Data Warehousing


Published on

Published in: Technology, Business
  • Be the first to comment

Introduction to Data Warehousing

  2. 2. INTRODUCTION • Data: Meaningful facts, text, graphics, images, sound, video segments. • Database: An organized collection of logically related data. • Information: Data processed to be useful in decision making. • Metadata: Data that describes data.
  3. 3. ADVANTAGES OF THE DATABASE APPROACH • Data Independence/Reduced Maintenance • Improved Data Sharing • Increased Application Development Productivity • Enforcement of Standards • Improved Data Quality (Constraints) • Better Data Accessibility/ Responsiveness • Security, Backup/Recovery, Concurrency
  4. 4. PROBLEM: HETEROGENEOUS INFORMATION SOURCES “Heterogeneities are everywhere” Personal Databases Digital Libraries Scientific Databases World Wide Web  Different interfaces  Different data representations  Duplicate and inconsistent information
  5. 5. PROBLEM: DATA MANAGEMENT IN LARGE ENTERPRISES  fragmentation of informational systems  Result of application (user)-driven development of operational systems Sales Administration Finance Manufacturing ... Sales Planning Stock Mngmt ... Suppliers ... Debt Mngmt Num. Control ... Inventory
  6. 6. SOLUTION: UNIFIED ACCESS TO DATA Integration System Collects and combines information Provides integrated view, uniform user interface Supports sharing World Wide Web Digital Libraries Scientific Databases Personal Databases
  7. 7. WHAT IS A DATA WAREHOUSE? “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
  8. 8. WHAT IS A DATA WAREHOUSE? “A DW is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making.”
  9. 9. A DATA WAREHOUSE IS Stored collection of diverse data A solution to data integration problem Single repository of information Subject-oriented Organized by subject, not by application Used for analysis, data mining, etc. Optimized differently from transaction- oriented db
  10. 10. A DATA WAREHOUSE IS Large volume of data (Gb, Tb) Non-volatile Historical Time attributes are important Updates infrequent
  11. 11. OLTP VS. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse
  12. 12. WAREHOUSE IS A SPECIALIZED DB Standard DB (OLTP)  Mostly updates  Many small transactions  Mb - Gb of data  Current snapshot  Raw data  Thousands of users (e.g., clerical users) Warehouse (OLAP) Mostly reads Queries are long and complex Gb - Tb of data History Summarized data Hundreds of users (e.g., decision- makers, analysts)
  13. 13. GENERIC WAREHOUSE ARCHITECTURE Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse Client Client Design Phase Maintenance Loading ... Metadata Optimization Query & Analysis
  15. 15. ETL CONCEPT
  16. 16. ETL CONCEPT
  17. 17. ISSUES IN DATA WAREHOUSING Warehouse Design Extraction Wrappers, monitors (change detectors) Integration Cleansing & merging Warehousing specification & Maintenance Optimizations