This document provides an overview of data warehousing. It defines a data warehouse as a central database that includes information from several different sources and keeps both current and historical data to support management decision making. The document describes key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. It also discusses common data warehouse architectures and applications.
Data warehouse
• Adata warehouse is an appliance for storing and
analyzing data, and reporting.
• Central database that includes information from several
different sources.
• Keeps current as well as historical data.
“Data Warehouse is a subject oriented, integrated, time-
variant and non-volatile collection of data in support of
management’s decision making process.”
– W. H. Inmon
Data Warehouse—Subject-Oriented
• Organizedaround major subjects, such as customer,
product, sales
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process
5.
Data Warehouse—Integrated
• Constructedby integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted
6.
Data Warehouse—Time Variant
•The time horizon for the data warehouse is significantly longer
than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time
element”
7.
Data Warehouse—Nonvolatile
• Aphysically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
8.
Data Warehouse Architecture
SystemB
System C
System D
System A
Extract
Transform
Load
The Data
Warehouse
BusinessModel
Self Serve
Data
Sources
ETL Data
Store
Data
Access
Presentation
Prompted Views
Dashboards
Scorecards
Ad-Hoc Reporting
9.
Applications
Industry Application
Finance Creditcard Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record Analysis
Transport Logistics management
Consumer goods Promotion Analysis
10.
Advantages
• Enhances end-useraccess to a wide variety of data.
• Increases data consistency.
• Increases productivity and decreases computing costs.
• Is able to combine data from different sources, in one place.
• It provides an infrastructure that could support changes to data
and replication of the changed data back into the operational
systems.
11.
Disadvantage
• Extracting, cleaningand loading data could be time
consuming.
• Problems with compatibility with systems already in place
e.g. transaction processing system.
• Providing training to end-users, who end up not using the data
warehouse.
• Security could develop into a serious issue, especially if the
data warehouse is web accessible.