INTRODUCTION TO
DATA WAREHOUSING
By: Eng. Eyad R. Manaa
INTRODUCTION
• Data: Meaningful facts, text, graphics, images,
sound, video segments.
• Database: An organized collection of logically
related data.
• Information: Data processed to be useful in
decision making.
• Metadata: Data that describes data.
ADVANTAGES OF THE DATABASE APPROACH
• Data Independence/Reduced Maintenance
• Improved Data Sharing
• Increased Application Development Productivity
• Enforcement of Standards
• Improved Data Quality (Constraints)
• Better Data Accessibility/ Responsiveness
• Security, Backup/Recovery, Concurrency
PROBLEM:
HETEROGENEOUS INFORMATION SOURCES
“Heterogeneities are
everywhere” Personal
Databases
Digital Libraries
Scientific Databases
World
Wide
Web
 Different interfaces
 Different data representations
 Duplicate and inconsistent information
PROBLEM: DATA MANAGEMENT IN LARGE
ENTERPRISES
 fragmentation of informational systems
 Result of application (user)-driven
development of operational systems
Sales Administration Finance Manufacturing ...
Sales Planning
Stock Mngmt
...
Suppliers
...
Debt Mngmt
Num. Control
...
Inventory
SOLUTION: UNIFIED ACCESS TO DATA
Integration System
Collects and combines information
Provides integrated view, uniform user interface
Supports sharing
World
Wide
Web
Digital Libraries Scientific Databases
Personal
Databases
WHAT IS A DATA WAREHOUSE?
“A data warehouse is simply a single,
complete, and consistent store of data
obtained from a variety of sources and
made available to end users in a way they
can understand and use it in a business
context.”
WHAT IS A DATA WAREHOUSE?
“A DW is a
subject-oriented,
integrated,
time-varying,
non-volatile
collection of data that is used primarily in
organizational decision making.”
A DATA WAREHOUSE IS
Stored collection of diverse data
A solution to data integration problem
Single repository of information
Subject-oriented
Organized by subject, not by application
Used for analysis, data mining, etc.
Optimized differently from transaction-
oriented db
A DATA WAREHOUSE IS
Large volume of data (Gb, Tb)
Non-volatile
Historical
Time attributes are important
Updates infrequent
OLTP VS. OLAP
OLTP: On Line Transaction Processing
Describes processing at operational sites
OLAP: On Line Analytical Processing
Describes processing at warehouse
WAREHOUSE IS A SPECIALIZED DB
Standard DB (OLTP)
 Mostly updates
 Many small transactions
 Mb - Gb of data
 Current snapshot
 Raw data
 Thousands of users (e.g.,
clerical users)
Warehouse (OLAP)
Mostly reads
Queries are long and complex
Gb - Tb of data
History
Summarized data
Hundreds of users
(e.g., decision-
makers, analysts)
GENERIC WAREHOUSE ARCHITECTURE
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
Integrator
Warehouse
Client Client
Design Phase
Maintenance
Loading
...
Metadata
Optimization
Query & Analysis
 ETL Concept
WAREHOUSING PROCESS
ETL CONCEPT
ETL CONCEPT
ISSUES IN DATA WAREHOUSING
Warehouse Design
Extraction
Wrappers, monitors (change detectors)
Integration
Cleansing & merging
Warehousing specification &
Maintenance
Optimizations

Introduction to Data Warehousing

  • 1.
  • 2.
    INTRODUCTION • Data: Meaningfulfacts, text, graphics, images, sound, video segments. • Database: An organized collection of logically related data. • Information: Data processed to be useful in decision making. • Metadata: Data that describes data.
  • 3.
    ADVANTAGES OF THEDATABASE APPROACH • Data Independence/Reduced Maintenance • Improved Data Sharing • Increased Application Development Productivity • Enforcement of Standards • Improved Data Quality (Constraints) • Better Data Accessibility/ Responsiveness • Security, Backup/Recovery, Concurrency
  • 4.
    PROBLEM: HETEROGENEOUS INFORMATION SOURCES “Heterogeneitiesare everywhere” Personal Databases Digital Libraries Scientific Databases World Wide Web  Different interfaces  Different data representations  Duplicate and inconsistent information
  • 5.
    PROBLEM: DATA MANAGEMENTIN LARGE ENTERPRISES  fragmentation of informational systems  Result of application (user)-driven development of operational systems Sales Administration Finance Manufacturing ... Sales Planning Stock Mngmt ... Suppliers ... Debt Mngmt Num. Control ... Inventory
  • 6.
    SOLUTION: UNIFIED ACCESSTO DATA Integration System Collects and combines information Provides integrated view, uniform user interface Supports sharing World Wide Web Digital Libraries Scientific Databases Personal Databases
  • 7.
    WHAT IS ADATA WAREHOUSE? “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
  • 8.
    WHAT IS ADATA WAREHOUSE? “A DW is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making.”
  • 9.
    A DATA WAREHOUSEIS Stored collection of diverse data A solution to data integration problem Single repository of information Subject-oriented Organized by subject, not by application Used for analysis, data mining, etc. Optimized differently from transaction- oriented db
  • 10.
    A DATA WAREHOUSEIS Large volume of data (Gb, Tb) Non-volatile Historical Time attributes are important Updates infrequent
  • 11.
    OLTP VS. OLAP OLTP:On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse
  • 12.
    WAREHOUSE IS ASPECIALIZED DB Standard DB (OLTP)  Mostly updates  Many small transactions  Mb - Gb of data  Current snapshot  Raw data  Thousands of users (e.g., clerical users) Warehouse (OLAP) Mostly reads Queries are long and complex Gb - Tb of data History Summarized data Hundreds of users (e.g., decision- makers, analysts)
  • 13.
    GENERIC WAREHOUSE ARCHITECTURE Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse ClientClient Design Phase Maintenance Loading ... Metadata Optimization Query & Analysis
  • 14.
  • 15.
  • 16.
  • 17.
    ISSUES IN DATAWAREHOUSING Warehouse Design Extraction Wrappers, monitors (change detectors) Integration Cleansing & merging Warehousing specification & Maintenance Optimizations