Historical data _ Detailed data _ Diverse data = Lots of data
Splitting data over multiple storage media based on frequency of usage
Archival storage is very similar to near-line storage , except that in archival storage, the probability of access drops very low. To put the probability of access in perspective, consider the following simple chart: High performance disk storage Access a unit of data once a month Near-line storage Access 0.5 units of data every year Archival storage Access 0.1 units of data every decade. Near-line storage can be thought of as a logical extension of the data warehouse. Archival storage cannot be thought of as a logical extension.
Options for Moving Data: ADVANTAGES Manual Very simple; available immediately; operates at the row level HSM Relatively simple; not too expensive; fully automated CMSM Fully automated; operates at the row level DISADVANTAGES Manual Prone to error; requires human interaction HSM Operates at the data set level CMSM Expensive; complex to implement and operate
third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied The Extension of the Data Warehouse across Different Storage Media: The data warehouse can grow to petabytes (equivalent to a quadrillion bytes) of data and can still be effective and still be managed.
third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied
1. Building Data WareHouse byInmonChapter 12: The Really Large Data Warehousehttp://it-slideshares.blogspot.com/
2. Why the Rapid Growth?The Impact of Large Volumes of DataDisk Storage in the Face of Data SeparationMoving Data from One Environment to AnotherInverting the Data WarehouseTotal CostMaximum CapacitySummary
3. Why the Rapid Growth?The data warehouse contains history.Data warehouses collect data at the most granular levelThe need to bring lots of different kinds of data together
4. The Impact of Large Volumes of Data Basic Data-Management Activities ◦ As data volumes grow large, normal database functions require increasingly larger amounts of resources. The Cost of Storage ◦ The volume of data grows, the cost of the data increases dramatically
5. The Impact of Large Volumes of Data The Real Costs of Storage ◦ There are lots of components to disk storage aside from the storage device itself  Disk controller  Communications lines  Processor  Software
6. The Impact of Large Volumes of Data The Usage Pattern of Data in the Face of Large Volumes ◦ Over time, as the volume of data grows, the percentage of data actually used drops
7. The Impact of Large Volumes of Data A Simple Calculation Usage ratio = Actual bytes used / Total data warehouse bytes ◦ the volume of data found in your data warehouse goes up, the actual percentage used goes down Two Classes of Data ◦ Infrequently used data is often called dormant data or inactive data. ◦ Frequently used data is often called actively used data.
8. The Impact of Large Volumes of Data Implications of Separating Data into Two Classes
9. Disk Storagein the Face of Data SeparationNear-Line Storage ◦ near-line storage, (depending on the vendor) is sequential storage ◦ Characteristics:  Robotically controlled  Inexpensive  Bulk amounts of data  Reliable over a long period of time  Seconds to access first record
10. Disk Storagein the Face of Data SeparationAccess Speed and Disk Storage ◦ The difference between freely flowing blood and blood with many restricting components
11. Disk Storagein the Face of Data SeparationArchival Storage ◦ Needs for split storage to manage large amount of data ◦ Besides disk storage and near-line or bulk storage ◦ Different with near-line storage
12. Disk Storagein the Face of Data SeparationImplications of Transparency ◦ A record or row in the data warehouse is identical to a record or row in near-line storage.
13. Moving Data fromOne Environment to Another Many ways: ◦ have a database administrator manually move data ◦ hierarchical storage management (HSM) ◦ the cross-media storage management (CMSM) option
14. Moving Data from One Environment to AnotherThe CMSM Approach ◦ The CMSM technology is fully automated. ◦ The CMSM is software that makes the physical location of the data transparent ◦ The end user does not need to know where data is—in the data warehouse or on near-line storage.
15. Moving Data fromOne Environment to AnotherA Data Warehouse Usage Monitor ◦ Streamline the operations of the CMSM environment ◦ Two types:  those that are supplied by the DBMS vendor  those supplied by third-party monitors
16. Inverting the Data Warehouseinverteddata warehouse: Consider a normal data warehouse.To build a data warehouse: ◦ Normal way: put data first into disk storage  (after the data ages) near-line or archival storage ◦ Alternative way: first enter data into near-line storage (not disk storage)  data is “staged” from the near-line environment to the disk environment (to accessed and analyzed)  (after over) returned to near-line storage
17. Total CostWith the introduction of near-line and archival storage, the growing costs of a data warehouse can be mitigated
18. Maximum Capacity“XYZ machine can handle up to nnn terabytes of data.”Parameters measures the machines capacity: Volumes of data Number of users Workload complexityThe balanced case is where there is a fair amount of data, a fair number of users, and a reasonably complex workload
19. SummaryData warehouses grow large explosivelyThe data inside the warehouse separates into one of two classes—frequently used data or infrequently used dataWithout near-line and/or archival storage, the costs of the data warehouseskyrocket as the data warehouse grows largehttp://it-slideshares.blogspot.com/