Characteristics of a Data Warehouse
A data warehouse is a database designed for
querying, reporting, and analysis.
A data warehouse contains historical data
derived from transaction data.
Data warehouses separate analysis workload
from transaction workload.
A data warehouse is primarily
an analytical tool.
Comparing OLTP and Data Warehouses
OLTP Data Warehouse
Many Joins Some
Comparatively Data accessed by Large
lower queries amount
Normalized Duplicated data Denormalized
Rare and Common
Data Warehouse Architectures
Flat files Data mining
Data Warehouse Design
• Key data warehouse design considerations:
– Identify the specific data content.
– Recognize the critical relationships within and
between groups of data.
– Define the system environment
supporting your data warehouse.
– Identify the required data
– Calculate the frequency at which
the data must be refreshed.
– A logical design is conceptual and
– Entity-relationship (ER) modeling
is useful in identifying logical
• An entity represents a chunk of data.
• The properties of entities are known as attributes.
• The links between entities and attributes are known
– Dimensional modeling is a specialized
type of ER modeling useful in data warehouse
Oracle Warehouse Builder
– Oracle Database provides tools to implement
the ETL process.
• Oracle Warehouse Builder is a tool to help in this
– Oracle Warehouse Builder generates the
following types of code:
• SQL data definition language (DDL) scripts
• PL/SQL programs
• SQL*Loader control files
• XML Processing Description Language (XPDL)
• ABAP code (used to extract data from SAP systems)
Data Warehousing Schemas
– Objects can be arranged in data warehousing
schema models in a variety of ways:
• Star schema
• Snowflake schema
• Third normal form (3NF) schema
• Hybrid schemas
– The source data model and user
requirements should steer the data
– Implementation of the logical model may
require changes to enable you to adapt it to
your physical system.
– Star schema
• Characterized by one or more large fact tables and a
number of much smaller dimension tables
• Each dimension table joined to the fact table using a
primary key to foreign key join
– Snowflake schema
• Dimension data grouped into multiple tables instead
of one large table
• Increased number of dimension tables, requiring
more foreign key joins
– Third normal form (3NF) schema
• A classical relational-database model that minimizes
data redundancy through normalization
Data Warehousing Objects
– Fact tables
• Fact tables are the large tables that store business
– Dimension tables
• A dimension is a structure composed of one or more
hierarchies that categorizes data.
• Unique identifiers are specified for one distinct
record in a dimension table.
• Relationships guarantee
integrity of business
– A fact table must be defined for each star schema.
– Fact tables are the large tables that store business
– A fact table contains either detail-level or
– A fact table usually contains facts with the same
level of aggregation.
– The primary key of the fact table is
usually a composite key made up
of all its foreign keys.
Dimensions and Hierarchies
– A dimension is a structure hierarchy (by level)
composed of one or more
hierarchies that categorizes data. REGION
– Dimensional attributes help to
describe the dimensional value. SUBREGION
– Dimension data is collected at the
lowest level of detail and aggregatedCOUNTRY
into higher level totals.
– Hierarchies are structures that use STATE
ordered levels to organize data.
– In a hierarchy, each level is CITY
connected to the levels above and
below it. CUSTOMER
Data Warehouse Physical Structures
• Tables and partitioned tables
– Partitioned tables enable you to split
large data volumes into smaller,
more manageable pieces.
– Expect performance benefits from:
• Partition pruning
• Intelligent parallel processing
– Compressed tables offer scaleup opportunities
for read-only operations.
– Table compression saves disk space.
Data Warehouse Physical Structures
• Are tailored presentations of data contained in one
or more tables or views
• Do not require any space in the database
– Materialized views:
• Are query results that have been stored in advance
• (Like indexes) are used transparently and improve
– Integrity constraints:
• Are used in data warehouses for query rewrite
• Are containers of logical relationships and do not
require any space in the database
Managing Large Volumes of Data
• Work smarter in your data warehouse:
– Bitmap indexes/Star transformation
– Data compression
– Query rewrite
• Work harder in your data warehouse:
– Parallelism for all operations
• DBA tasks, such as loading, index creation, table
creation, data modification, backup and recovery
• End-user operations, such as queries
• Unbounded scalability: Real Application Clusters
I/O Performance in Data Warehouses
– I/O is typically the primary determinant of data
– Data warehouse storage configurations should be
chosen by I/O bandwidth, not storage capacity.
– Every component of the I/O
subsystem should provide
• I/O channels
• I/O adapters
– In data warehouses, maximizing
sequential I/O throughput is critical.
– Reduces response time for data-intensive operations on large
– Benefits systems with the following characteristics:
• Multiprocessors, clusters, or massively parallel systems
• Sufficient I/O bandwidth
• Sufficient memory to support memory-intensive processes such
as sorts, hashing, and I/O buffers
Data on disk Scan Sort Q1
Scan Sort Q2
Scan Sort Q3
Scan Sort Q4
Scanners Sorters (Aggregators)
• Automatic Storage Management (ASM)
– Configuring storage for a DB depends on many
• Which data to put on which disk
• Logical unit number (LUN) configurations
• DB types and workloads; data warehouse, OLTP, DSS
• Trade-offs between available options
– ASM provides solutions to storage issues
encountered in data warehouses.
• Automatic Storage Management: Overview
– Portable and high-performance
cluster file system Application
– Manages Oracle database files
– Data spread across disks Database
to balance load File
– Integrated mirroring across system
– Solves many storage
management challenges Operating system
Visit more self help tutorials
• Pick a tutorial of your choice and browse
through it at your own pace.
• The tutorials section is free, self-guiding and
will not involve any additional support.
• Visit us at www.dataminingtools.net