Lecture 12 The Really Large Data Warehouse

Building Data WareHouse by
Inmon

Chapter 12: The Really Large Data Warehouse

http://it-slideshares.blogspot.com/

Why the Rapid Growth?
The Impact of Large Volumes of Data
Disk Storage in the Face of Data Separation
Moving Data from One Environment to
Another
Inverting the Data Warehouse
Total Cost
Maximum Capacity
Summary

Why the Rapid Growth?
The data warehouse contains history.
Data warehouses collect data at the most
granular level
The need to bring lots of different kinds
of data together

The Impact of Large Volumes of Data
Basic Data-Management Activities
◦ As data volumes grow large, normal database
functions require increasingly larger amounts
of resources.
The Cost of Storage
◦ The volume of data grows, the cost of the
data increases dramatically

The Real Costs of Storage
◦ There are lots of components to disk storage
aside from the storage device itself
 Disk controller
 Communications lines
 Processor
 Software

The Usage Pattern of Data in the Face of
Large Volumes
◦ Over time, as the volume of data grows, the
percentage of data actually used drops

A Simple Calculation
Usage ratio = Actual bytes used / Total data warehouse bytes
◦ the volume of data found in your data
warehouse goes up, the actual percentage
used goes down
Two Classes of Data
◦ Infrequently used data is often called dormant
data or inactive data.
◦ Frequently used data is often called actively used
data.

Implications of Separating Data into Two
Classes

Disk Storage
in the Face of Data Separation
Near-Line Storage
◦ near-line storage, (depending on the vendor) is
sequential storage
◦ Characteristics:
 Robotically controlled
 Inexpensive
 Bulk amounts of data
 Reliable over a long period of time
 Seconds to access first record

Disk Storage
Access Speed and Disk Storage
◦ The difference between freely flowing blood
and blood with many restricting components

Disk Storage
Archival Storage
◦ Needs for split storage to manage large
amount of data
◦ Besides disk storage and near-line or bulk
storage
◦ Different with near-line storage

Disk Storage
Implications of Transparency
◦ A record or row in the data warehouse is
identical to a record or row in near-line
storage.

Moving Data from
One Environment to Another
 Many ways:
◦ have a database administrator manually move data
◦ hierarchical storage management (HSM)
◦ the cross-media storage management (CMSM) option

Moving Data from
The CMSM Approach
◦ The CMSM technology is fully
automated.
◦ The CMSM is software that makes
the physical location of the data
transparent
◦ The end user does not need to
know where data is—in the data
warehouse or on near-line
storage.

Moving Data from
A Data Warehouse Usage Monitor
◦ Streamline the operations of the CMSM
environment
◦ Two types:
 those that are supplied by the DBMS vendor
 those supplied by third-party monitors

Inverting the Data Warehouse
inverteddata warehouse: Consider a
normal data warehouse.
To build a data warehouse:
◦ Normal way: put data first into disk storage
 (after the data ages) near-line or archival
storage
◦ Alternative way: first enter data into near-line
storage (not disk storage)  data is “staged”
from the near-line environment to the disk
environment (to accessed and analyzed) 
(after over) returned to near-line storage

Total Cost
With the introduction of near-line and
archival storage, the growing costs of a
data warehouse can be mitigated

Maximum Capacity
“XYZ machine can handle up to nnn terabytes
of data.”
Parameters measures the machines capacity:
Volumes of data
Number of users
Workload complexity

The balanced case is where there is a fair
amount of data, a fair number of users, and a
reasonably complex workload

Summary
Data warehouses grow large explosively
The data inside the warehouse separates
into one of two classes—frequently used
data or infrequently used data
Without near-line and/or archival
storage, the costs of the data
warehouseskyrocket as the data
warehouse grows large
http://it-slideshares.blogspot.com/

Lecture 12 The Really Large Data Warehouse

More Related Content

What's hot

Similar to Lecture 12 The Really Large Data Warehouse

More from phanleson

Recently uploaded

Lecture 12 The Really Large Data Warehouse

Editor's Notes