Building Data WareHouse byInmonChapter 12: The Really Large Data Warehousehttp://it-slideshares.blogspot.com/
Why   the Rapid Growth?The Impact of Large Volumes of DataDisk Storage in the Face of Data SeparationMoving Data from ...
Why the Rapid Growth?The  data warehouse contains history.Data warehouses collect data at the most granular levelThe ne...
The Impact of Large Volumes of Data    Basic   Data-Management Activities     ◦ As data volumes grow large, normal databa...
The Impact of Large Volumes of Data    The    Real Costs of Storage     ◦ There are lots of components to disk storage   ...
The Impact of Large Volumes of Data    The Usage Pattern of Data in the Face of     Large Volumes     ◦ Over time, as the...
The Impact of Large Volumes of Data    A   Simple Calculation     Usage ratio = Actual bytes used / Total data warehouse ...
The Impact of Large Volumes of Data    Implications   of Separating Data into Two     Classes
Disk Storagein the Face of Data SeparationNear-Line       Storage ◦ near-line storage, (depending on the vendor) is   seq...
Disk Storagein the Face of Data SeparationAccess   Speed and Disk Storage ◦ The difference between freely flowing blood  ...
Disk Storagein the Face of Data SeparationArchival   Storage ◦ Needs for split storage to manage large   amount of data ◦...
Disk Storagein the Face of Data SeparationImplications   of Transparency ◦ A record or row in the data warehouse is   ide...
Moving Data fromOne Environment to Another Many   ways:  ◦ have a database administrator manually move data  ◦ hierarchic...
Moving Data from    One Environment to AnotherThe   CMSM Approach ◦ The CMSM technology is fully   automated. ◦ The CMSM ...
Moving Data fromOne Environment to AnotherA   Data Warehouse Usage Monitor ◦ Streamline the operations of the CMSM   envi...
Inverting the Data Warehouseinverteddata warehouse: Consider a normal data warehouse.To build a data warehouse: ◦ Normal...
Total CostWith  the introduction of near-line and archival storage, the growing costs of a data warehouse can be mitigated
Maximum Capacity“XYZ   machine can handle up to nnn terabytes of data.”Parameters measures the machines capacity:  Volu...
SummaryData  warehouses grow large explosivelyThe data inside the warehouse separates into one of two classes—frequently...
Lecture 12 The Really Large Data Warehouse
Upcoming SlideShare
Loading in...5
×

Lecture 12 The Really Large Data Warehouse

227

Published on

Building the Data WareHouse http://it-slideshares.blogspot.com

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
227
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Historical data _ Detailed data _ Diverse data = Lots of data
  • Splitting data over multiple storage media based on frequency of usage
  • Archival storage is very similar to near-line storage , except that in archival storage, the probability of access drops very low. To put the probability of access in perspective, consider the following simple chart: High performance disk storage Access a unit of data once a month Near-line storage Access 0.5 units of data every year Archival storage Access 0.1 units of data every decade. Near-line storage can be thought of as a logical extension of the data warehouse. Archival storage cannot be thought of as a logical extension.
  • Options for Moving Data: ADVANTAGES Manual Very simple; available immediately; operates at the row level HSM Relatively simple; not too expensive; fully automated CMSM Fully automated; operates at the row level DISADVANTAGES Manual Prone to error; requires human interaction HSM Operates at the data set level CMSM Expensive; complex to implement and operate
  • third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied The Extension of the Data Warehouse across Different Storage Media: The data warehouse can grow to petabytes (equivalent to a quadrillion bytes) of data and can still be effective and still be managed.
  • third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied
  • Lecture 12 The Really Large Data Warehouse

    1. 1. Building Data WareHouse byInmonChapter 12: The Really Large Data Warehousehttp://it-slideshares.blogspot.com/
    2. 2. Why the Rapid Growth?The Impact of Large Volumes of DataDisk Storage in the Face of Data SeparationMoving Data from One Environment to AnotherInverting the Data WarehouseTotal CostMaximum CapacitySummary
    3. 3. Why the Rapid Growth?The data warehouse contains history.Data warehouses collect data at the most granular levelThe need to bring lots of different kinds of data together
    4. 4. The Impact of Large Volumes of Data Basic Data-Management Activities ◦ As data volumes grow large, normal database functions require increasingly larger amounts of resources. The Cost of Storage ◦ The volume of data grows, the cost of the data increases dramatically
    5. 5. The Impact of Large Volumes of Data The Real Costs of Storage ◦ There are lots of components to disk storage aside from the storage device itself  Disk controller  Communications lines  Processor  Software
    6. 6. The Impact of Large Volumes of Data The Usage Pattern of Data in the Face of Large Volumes ◦ Over time, as the volume of data grows, the percentage of data actually used drops
    7. 7. The Impact of Large Volumes of Data A Simple Calculation Usage ratio = Actual bytes used / Total data warehouse bytes ◦ the volume of data found in your data warehouse goes up, the actual percentage used goes down Two Classes of Data ◦ Infrequently used data is often called dormant data or inactive data. ◦ Frequently used data is often called actively used data.
    8. 8. The Impact of Large Volumes of Data Implications of Separating Data into Two Classes
    9. 9. Disk Storagein the Face of Data SeparationNear-Line Storage ◦ near-line storage, (depending on the vendor) is sequential storage ◦ Characteristics:  Robotically controlled  Inexpensive  Bulk amounts of data  Reliable over a long period of time  Seconds to access first record
    10. 10. Disk Storagein the Face of Data SeparationAccess Speed and Disk Storage ◦ The difference between freely flowing blood and blood with many restricting components
    11. 11. Disk Storagein the Face of Data SeparationArchival Storage ◦ Needs for split storage to manage large amount of data ◦ Besides disk storage and near-line or bulk storage ◦ Different with near-line storage
    12. 12. Disk Storagein the Face of Data SeparationImplications of Transparency ◦ A record or row in the data warehouse is identical to a record or row in near-line storage.
    13. 13. Moving Data fromOne Environment to Another Many ways: ◦ have a database administrator manually move data ◦ hierarchical storage management (HSM) ◦ the cross-media storage management (CMSM) option
    14. 14. Moving Data from One Environment to AnotherThe CMSM Approach ◦ The CMSM technology is fully automated. ◦ The CMSM is software that makes the physical location of the data transparent ◦ The end user does not need to know where data is—in the data warehouse or on near-line storage.
    15. 15. Moving Data fromOne Environment to AnotherA Data Warehouse Usage Monitor ◦ Streamline the operations of the CMSM environment ◦ Two types:  those that are supplied by the DBMS vendor  those supplied by third-party monitors
    16. 16. Inverting the Data Warehouseinverteddata warehouse: Consider a normal data warehouse.To build a data warehouse: ◦ Normal way: put data first into disk storage  (after the data ages) near-line or archival storage ◦ Alternative way: first enter data into near-line storage (not disk storage)  data is “staged” from the near-line environment to the disk environment (to accessed and analyzed)  (after over) returned to near-line storage
    17. 17. Total CostWith the introduction of near-line and archival storage, the growing costs of a data warehouse can be mitigated
    18. 18. Maximum Capacity“XYZ machine can handle up to nnn terabytes of data.”Parameters measures the machines capacity: Volumes of data Number of users Workload complexityThe balanced case is where there is a fair amount of data, a fair number of users, and a reasonably complex workload
    19. 19. SummaryData warehouses grow large explosivelyThe data inside the warehouse separates into one of two classes—frequently used data or infrequently used dataWithout near-line and/or archival storage, the costs of the data warehouseskyrocket as the data warehouse grows largehttp://it-slideshares.blogspot.com/

    ×