CHAPTER 2 - Datawarehouse Architecture.pptx

1
Data warehousing and
Data Mining
Dr.R.U.Anitha

2
Chapter # 2
Data Warehouse Architecture & Models
Text Book: Modern Data Warehousing, Mining,
and Visualization: Core Concepts
by George M. Marakas

3
 Learning Objectives
 The main objective of this chapter is to let the
students know the basic structure of a Data
warehouse.
 Learn about Data warehouse physical
architecture and various principles of a Data
warehousing.
 To make students understand a data warehouse
model.
 It also explains various types of
multidimensional models and Schemas.

4
 Major Topics
 Structure of a Data Warehouse
 Data Warehouse Architectures
 Three Data Warehouse Models
 Extraction, Transformation and Loading (ETL)
 Design of Data Warehouse
 Multidimensional Data Model
 Data cube
 Conceptual Modeling of Data Warehouses
Dimension Schema

5
Structure of a Data Warehouse
 Essentially, a data warehouse provides historical
data for decision-support applications.
 Such applications include reporting, online
analytical processing (OLAP), executive
information systems (EIS), and data mining.
 Data warehouse is a centralized, integrated
repository of information.

6
 Here, integrated means cleaned up, merged, and
redesigned. This may be more or less
complicated
 Depending on how many systems feed into a
warehouse and how widely they differ in
handling similar information.
 But most companies already have repositories of
information in their production systems and
many of them are centralized.

7
Data Warehouse Architectures: Conceptual
View
Single-layer
–Every data element is stored once only
–Virtual warehouse

8
Two-layer
 Real-time + derived data
–Most
commonly used approach in industry today

9
Three-layer Architecture: Conceptual View
 Transformation of real-time data to derived data
really requires two steps

10
The architecture consists of various
interconnected elements:
 Operational and external database layer –
the source data for the DW
 Information access layer – the tools the end
user access to extract and analyze the data
 Data access layer – the interface between the
operational and information access layers
 Metadata layer – the data directory or
repository of metadata information

11
Additional layers are:
 Process management layer – the scheduler or
job controller
 Application messaging layer – the
“middleware” that transports information
 around the firm
 Physical data warehouse layer – where the
actual data used in the DSS are located
 Data staging layer – all of the processes
necessary to select, edit, summarize and load
warehouse data from the operational and
external data bases

12
Components of the Data Warehouse
Architecture

13
Data Warehouse Architecture

14
Three Data Warehouse Models
 Enterprise Warehouse- Collects all of the
information about subjects spanning the entire
organization.
 The Central data warehouse- A single physical
database contains all of the data for a specific
functional area.
 The distributed data warehouse- The
components are distributed across several
physical databases.

15
Extraction, Transformation and Loading (ETL)
 Data extraction
- get data from multiple, heterogeneous, and external
sources
 Data cleaning
–detect errors in the data and rectify them when possible
 Data transformation
–convert data from legacy or host format to warehouse
format
 Loading
–sort, summarize, consolidate, compute views, check
integrity, and build indices and partitions
 Refresh
–propagate the updates from the data sources to the
warehouse

16
Design of Data Warehouse: A Business Analysis
Framework
Four views regarding the design of a data warehouse
 Top-down view
It allows selection of the relevant information necessary for
the data warehouse
 Data source view
It exposes the information being captured, stored, and
managed by operational systems
 Data warehouse view
It consists of fact tables and dimension tables
 Business query view
It shows the perspectives of data in the warehouse from the
view of end-user.

17
Multidimensional Data Model
From Tables and Spreadsheets to Data Cubes
 A data warehouse is based on a multidimensional data
model which views data in the form of a data cube
 A data cube, such as sales, allows data to be modeled
and viewed in multiple dimensions
Dimension tables[Eg. item (item_name, brand, type), or
time(day, week, month, quarter, year) ]
Fact table contains measures (such as dollars_sold) and
keys to each of the related dimension tables
 In data warehousing literature, an n-D base cube is
called a base cuboid. The top most 0-D cuboid, which
holds the highest-level of summarization, is called the
apex cuboid. The lattice of cuboids forms a data cube.

19
Data Cube
 n-dimensional cube C[A1,A2,…An] is a database
with n dimensions as A1,A2…An each of which
represents a theme and contains |Ai| number of
distinct elements in the dimension Ai.
 Each distinct element of Ai corresponds to a data
row of C.
 A data cell in the cube C[a1,a2,..an] stores the
numeric measures of the data for Ai=ai, for all i.
 Eg. Data cube C[place, product, year]
 Data Cell [Peshawar, TV, 4Qtr] stores 314 as
numeric measure which represents three
dimensions with 3,1 and 4rows respectively.

22
Typical Data cube OLAP Operations
 Roll up (drill-up): Summarize data by climbing
up hierarchy or by dimension reduction
 Drill down(roll down): Reverse of roll-up
 From higher level summary to lower level
summary or detailed data, or introducing new
dimensions
 Slice and Dice: Project and select
 Pivot (rotate): reorient the cube visualization,
3D to series of 2D planes
 Drill across: involving (across) more than one
fact table
 Drill through: through the bottom level of the
cube to its back-end relational tables (using SQL)

24
Conceptual Modeling of Data Warehouses
Dimension Schema
 Modeling data warehouses: dimensions &
measures
 Starschema: A fact table in the middle
connected to a set of dimension tables
 Snowflakeschema: A refinement of star schema
where some dimensional hierarchy is
normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
 Factconstellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact
constellation

25
Star Schema
 Easy for users to understand
 Fast response to queries
 Simple metadata
 Supported by many front end tools
 Less robust to change
 Slower to build
 Does not support history

27
Snowflake Schema
 Direct use by some tools
 More flexible to change
 Provides for speedier data loading
 May become large and unmanageable
 Degrades query performance
 More complex metadata

29
Fact Constellation
 Measure of online analytical processing
 Collection of multiple fact tables
 Sharing dimension tables
 Viewed as a collection of stars.
 Improvement over Star schema.

CHAPTER 2 - Datawarehouse Architecture.pptx

More Related Content

Similar to CHAPTER 2 - Datawarehouse Architecture.pptx

More from AnithaSakthivel3

Recently uploaded

CHAPTER 2 - Datawarehouse Architecture.pptx