Chapter 1 1
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
DATA WAREHOUSE
DATA WAREHOUSE
ARCHITECTURES
ARCHITECTURES
Independent Data Mart
Dependent Data Mart and Operational
Data Store
Logical Data Mart and Real-Time Data
Warehouse
Three-Layer architecture
1
All involve some form of extract, transform and load (
(ETL
ETL)
2
Figure 9-2 Independent data mart
data warehousing architecture
Data marts:
Data marts:
Mini-warehouses, limited in scope
E
T
L
Separate ETL for each
independent data mart
Data access complexity
due to multiple data marts
Chapter 1 2
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
3
Figure 9-3 Dependent data mart with
operational data store: a three-level architecture
E
T
L
Single ETL for
enterprise data warehouse (EDW)
(EDW)
Simpler data access
ODS
ODS provides option for
obtaining current data
Dependent data marts
loaded from EDW
Chapter 1 3
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
4
E
T
L
Near real-time ETL for
Data Warehouse
Data Warehouse
ODS
ODS and data warehouse
data warehouse
are one and the same
Data marts are NOT separate databases,
but logical views of the data warehouse
 Easier to create new data marts
Figure 9-4 Logical data mart and real
time warehouse architecture
Chapter 1 4
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
5
Chapter 1 5
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
6
Figure 9-5 Three-layer data architecture for a data warehouse
Chapter 1 6
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
Chapter 1 7
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
DATA CHARACTERISTICS
DATA CHARACTERISTICS
STATUS VS. EVENT DATA
STATUS VS. EVENT DATA
7
Statu
s
Status
Event = a
database action
(create/
update/ delete)
that results
from a
transaction
Figure 9-6
Example of DBMS
log entry
Chapter 1 8
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
With transient
data, changes
to existing
records are
written over
previous
records, thus
destroying the
previous data
content
Figure 9-7
Transient
operational data
DATA CHARACTERISTICS
DATA CHARACTERISTICS
STATUS VS. EVENT DATA
STATUS VS. EVENT DATA
Chapter 1 9
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi 9
Periodic data
are never
physically
altered or
deleted once
they have been
added to the
store
Figure 9-8 Periodic
warehouse data
DATA CHARACTERISTICS
DATA CHARACTERISTICS
STATUS VS. EVENT DATA
STATUS VS. EVENT DATA
Chapter 1 10
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
OTHER DATA WAREHOUSE
OTHER DATA WAREHOUSE
CHANGES
CHANGES
 New descriptive attributes
 New business activity attributes
 New classes of descriptive attributes
 Descriptive attributes become more
refined
 Descriptive data are related to one
another
 New source of data
10
Chapter 1 11
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
DERIVED DATA
DERIVED DATA
 Objectives
 Ease of use for decision support applications
 Fast response to predefined user queries
 Customized data for particular target audiences
 Ad-hoc query support
 Data mining capabilities
 Characteristics
 Detailed (mostly periodic) data
 Aggregate (for summary)
 Distributed (to departmental servers)
11
Most common data model = dimensional model
(usually implemented as a star schema)
12
Figure 9-9 Components of a star schema
star schema
Fact tables contain factual
or quantitative data
Dimension tables contain descriptions
about the subjects of the business
1:N relationship between
dimension tables and fact tables
Excellent for ad-hoc queries, but bad for online transaction
processing
Dimension tables are denormalized
to maximize performance
Chapter 1 12
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
13
Figure 9-10 Star schema example
Fact table provides statistics for
sales broken down by product, period
and store dimensions
Chapter 1 13
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi
14
Figure 9-11 Star schema with sample data
Chapter 1 14
Copyright Š Prof. Gufran Qureshi
Copyright Š Prof. Gufran Qureshi

Architecture of Data Warehouse for Data Science

  • 1.
    Chapter 1 1 Copyright© Prof. Gufran Qureshi Copyright © Prof. Gufran Qureshi DATA WAREHOUSE DATA WAREHOUSE ARCHITECTURES ARCHITECTURES Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture 1 All involve some form of extract, transform and load ( (ETL ETL)
  • 2.
    2 Figure 9-2 Independentdata mart data warehousing architecture Data marts: Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts Chapter 1 2 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 3.
    3 Figure 9-3 Dependentdata mart with operational data store: a three-level architecture E T L Single ETL for enterprise data warehouse (EDW) (EDW) Simpler data access ODS ODS provides option for obtaining current data Dependent data marts loaded from EDW Chapter 1 3 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 4.
    4 E T L Near real-time ETLfor Data Warehouse Data Warehouse ODS ODS and data warehouse data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts Figure 9-4 Logical data mart and real time warehouse architecture Chapter 1 4 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 5.
    5 Chapter 1 5 CopyrightŠ Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 6.
    6 Figure 9-5 Three-layerdata architecture for a data warehouse Chapter 1 6 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 7.
    Chapter 1 7 CopyrightŠ Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi DATA CHARACTERISTICS DATA CHARACTERISTICS STATUS VS. EVENT DATA STATUS VS. EVENT DATA 7 Statu s Status Event = a database action (create/ update/ delete) that results from a transaction Figure 9-6 Example of DBMS log entry
  • 8.
    Chapter 1 8 CopyrightŠ Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi With transient data, changes to existing records are written over previous records, thus destroying the previous data content Figure 9-7 Transient operational data DATA CHARACTERISTICS DATA CHARACTERISTICS STATUS VS. EVENT DATA STATUS VS. EVENT DATA
  • 9.
    Chapter 1 9 CopyrightŠ Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi 9 Periodic data are never physically altered or deleted once they have been added to the store Figure 9-8 Periodic warehouse data DATA CHARACTERISTICS DATA CHARACTERISTICS STATUS VS. EVENT DATA STATUS VS. EVENT DATA
  • 10.
    Chapter 1 10 Copyright© Prof. Gufran Qureshi Copyright © Prof. Gufran Qureshi OTHER DATA WAREHOUSE OTHER DATA WAREHOUSE CHANGES CHANGES  New descriptive attributes  New business activity attributes  New classes of descriptive attributes  Descriptive attributes become more refined  Descriptive data are related to one another  New source of data 10
  • 11.
    Chapter 1 11 Copyright© Prof. Gufran Qureshi Copyright © Prof. Gufran Qureshi DERIVED DATA DERIVED DATA  Objectives  Ease of use for decision support applications  Fast response to predefined user queries  Customized data for particular target audiences  Ad-hoc query support  Data mining capabilities  Characteristics  Detailed (mostly periodic) data  Aggregate (for summary)  Distributed (to departmental servers) 11 Most common data model = dimensional model (usually implemented as a star schema)
  • 12.
    12 Figure 9-9 Componentsof a star schema star schema Fact tables contain factual or quantitative data Dimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processing Dimension tables are denormalized to maximize performance Chapter 1 12 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 13.
    13 Figure 9-10 Starschema example Fact table provides statistics for sales broken down by product, period and store dimensions Chapter 1 13 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi
  • 14.
    14 Figure 9-11 Starschema with sample data Chapter 1 14 Copyright Š Prof. Gufran Qureshi Copyright Š Prof. Gufran Qureshi