2
Chapter # 2
DataWarehouse Architecture & Models
Text Book: Modern Data Warehousing, Mining,
and Visualization: Core Concepts
by George M. Marakas
3.
3
Learning Objectives
The main objective of this chapter is to let the
students know the basic structure of a Data
warehouse.
Learn about Data warehouse physical
architecture and various principles of a Data
warehousing.
To make students understand a data warehouse
model.
It also explains various types of
multidimensional models and Schemas.
4.
4
Major Topics
Structure of a Data Warehouse
Data Warehouse Architectures
Three Data Warehouse Models
Extraction, Transformation and Loading (ETL)
Design of Data Warehouse
Multidimensional Data Model
Data cube
Conceptual Modeling of Data Warehouses
Dimension Schema
5.
5
Structure of aData Warehouse
Essentially, a data warehouse provides historical
data for decision-support applications.
Such applications include reporting, online
analytical processing (OLAP), executive
information systems (EIS), and data mining.
Data warehouse is a centralized, integrated
repository of information.
6.
6
Here, integratedmeans cleaned up, merged, and
redesigned. This may be more or less
complicated
Depending on how many systems feed into a
warehouse and how widely they differ in
handling similar information.
But most companies already have repositories of
information in their production systems and
many of them are centralized.
10
The architecture consistsof various
interconnected elements:
Operational and external database layer –
the source data for the DW
Information access layer – the tools the end
user access to extract and analyze the data
Data access layer – the interface between the
operational and information access layers
Metadata layer – the data directory or
repository of metadata information
11.
11
Additional layers are:
Process management layer – the scheduler or
job controller
Application messaging layer – the
“middleware” that transports information
around the firm
Physical data warehouse layer – where the
actual data used in the DSS are located
Data staging layer – all of the processes
necessary to select, edit, summarize and load
warehouse data from the operational and
external data bases
14
Three Data WarehouseModels
Enterprise Warehouse- Collects all of the
information about subjects spanning the entire
organization.
The Central data warehouse- A single physical
database contains all of the data for a specific
functional area.
The distributed data warehouse- The
components are distributed across several
physical databases.
15.
15
Extraction, Transformation andLoading (ETL)
Data extraction
- get data from multiple, heterogeneous, and external
sources
Data cleaning
–detect errors in the data and rectify them when possible
Data transformation
–convert data from legacy or host format to warehouse
format
Loading
–sort, summarize, consolidate, compute views, check
integrity, and build indices and partitions
Refresh
–propagate the updates from the data sources to the
warehouse
16.
16
Design of DataWarehouse: A Business Analysis
Framework
Four views regarding the design of a data warehouse
Top-down view
It allows selection of the relevant information necessary for
the data warehouse
Data source view
It exposes the information being captured, stored, and
managed by operational systems
Data warehouse view
It consists of fact tables and dimension tables
Business query view
It shows the perspectives of data in the warehouse from the
view of end-user.
17.
17
Multidimensional Data Model
FromTables and Spreadsheets to Data Cubes
A data warehouse is based on a multidimensional data
model which views data in the form of a data cube
A data cube, such as sales, allows data to be modeled
and viewed in multiple dimensions
Dimension tables[Eg. item (item_name, brand, type), or
time(day, week, month, quarter, year) ]
Fact table contains measures (such as dollars_sold) and
keys to each of the related dimension tables
In data warehousing literature, an n-D base cube is
called a base cuboid. The top most 0-D cuboid, which
holds the highest-level of summarization, is called the
apex cuboid. The lattice of cuboids forms a data cube.
19
Data Cube
n-dimensionalcube C[A1,A2,…An] is a database
with n dimensions as A1,A2…An each of which
represents a theme and contains |Ai| number of
distinct elements in the dimension Ai.
Each distinct element of Ai corresponds to a data
row of C.
A data cell in the cube C[a1,a2,..an] stores the
numeric measures of the data for Ai=ai, for all i.
Eg. Data cube C[place, product, year]
Data Cell [Peshawar, TV, 4Qtr] stores 314 as
numeric measure which represents three
dimensions with 3,1 and 4rows respectively.
22
Typical Data cubeOLAP Operations
Roll up (drill-up): Summarize data by climbing
up hierarchy or by dimension reduction
Drill down(roll down): Reverse of roll-up
From higher level summary to lower level
summary or detailed data, or introducing new
dimensions
Slice and Dice: Project and select
Pivot (rotate): reorient the cube visualization,
3D to series of 2D planes
Drill across: involving (across) more than one
fact table
Drill through: through the bottom level of the
cube to its back-end relational tables (using SQL)
24
Conceptual Modeling ofData Warehouses
Dimension Schema
Modeling data warehouses: dimensions &
measures
Starschema: A fact table in the middle
connected to a set of dimension tables
Snowflakeschema: A refinement of star schema
where some dimensional hierarchy is
normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Factconstellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact
constellation
25.
25
Star Schema
Easyfor users to understand
Fast response to queries
Simple metadata
Supported by many front end tools
Less robust to change
Slower to build
Does not support history
27
Snowflake Schema
Directuse by some tools
More flexible to change
Provides for speedier data loading
May become large and unmanageable
Degrades query performance
More complex metadata
29
Fact Constellation
Measureof online analytical processing
Collection of multiple fact tables
Sharing dimension tables
Viewed as a collection of stars.
Improvement over Star schema.