Your SlideShare is downloading. ×
0
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Lecture 12 The Really Large Data Warehouse
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lecture 12 The Really Large Data Warehouse

215

Published on

Building the Data WareHouse http://it-slideshares.blogspot.com

Building the Data WareHouse http://it-slideshares.blogspot.com

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
215
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Historical data _ Detailed data _ Diverse data = Lots of data
  • Splitting data over multiple storage media based on frequency of usage
  • Archival storage is very similar to near-line storage , except that in archival storage, the probability of access drops very low. To put the probability of access in perspective, consider the following simple chart: High performance disk storage Access a unit of data once a month Near-line storage Access 0.5 units of data every year Archival storage Access 0.1 units of data every decade. Near-line storage can be thought of as a logical extension of the data warehouse. Archival storage cannot be thought of as a logical extension.
  • Options for Moving Data: ADVANTAGES Manual Very simple; available immediately; operates at the row level HSM Relatively simple; not too expensive; fully automated CMSM Fully automated; operates at the row level DISADVANTAGES Manual Prone to error; requires human interaction HSM Operates at the data set level CMSM Expensive; complex to implement and operate
  • third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied The Extension of the Data Warehouse across Different Storage Media: The data warehouse can grow to petabytes (equivalent to a quadrillion bytes) of data and can still be effective and still be managed.
  • third-party monitors are much better because the monitors supplied by the DBMS vendors require far more resources than those supplied
  • Transcript

    • 1. Building Data WareHouse byInmonChapter 12: The Really Large Data Warehousehttp://it-slideshares.blogspot.com/
    • 2. Why the Rapid Growth?The Impact of Large Volumes of DataDisk Storage in the Face of Data SeparationMoving Data from One Environment to AnotherInverting the Data WarehouseTotal CostMaximum CapacitySummary
    • 3. Why the Rapid Growth?The data warehouse contains history.Data warehouses collect data at the most granular levelThe need to bring lots of different kinds of data together
    • 4. The Impact of Large Volumes of Data Basic Data-Management Activities ◦ As data volumes grow large, normal database functions require increasingly larger amounts of resources. The Cost of Storage ◦ The volume of data grows, the cost of the data increases dramatically
    • 5. The Impact of Large Volumes of Data The Real Costs of Storage ◦ There are lots of components to disk storage aside from the storage device itself  Disk controller  Communications lines  Processor  Software
    • 6. The Impact of Large Volumes of Data The Usage Pattern of Data in the Face of Large Volumes ◦ Over time, as the volume of data grows, the percentage of data actually used drops
    • 7. The Impact of Large Volumes of Data A Simple Calculation Usage ratio = Actual bytes used / Total data warehouse bytes ◦ the volume of data found in your data warehouse goes up, the actual percentage used goes down Two Classes of Data ◦ Infrequently used data is often called dormant data or inactive data. ◦ Frequently used data is often called actively used data.
    • 8. The Impact of Large Volumes of Data Implications of Separating Data into Two Classes
    • 9. Disk Storagein the Face of Data SeparationNear-Line Storage ◦ near-line storage, (depending on the vendor) is sequential storage ◦ Characteristics:  Robotically controlled  Inexpensive  Bulk amounts of data  Reliable over a long period of time  Seconds to access first record
    • 10. Disk Storagein the Face of Data SeparationAccess Speed and Disk Storage ◦ The difference between freely flowing blood and blood with many restricting components
    • 11. Disk Storagein the Face of Data SeparationArchival Storage ◦ Needs for split storage to manage large amount of data ◦ Besides disk storage and near-line or bulk storage ◦ Different with near-line storage
    • 12. Disk Storagein the Face of Data SeparationImplications of Transparency ◦ A record or row in the data warehouse is identical to a record or row in near-line storage.
    • 13. Moving Data fromOne Environment to Another Many ways: ◦ have a database administrator manually move data ◦ hierarchical storage management (HSM) ◦ the cross-media storage management (CMSM) option
    • 14. Moving Data from One Environment to AnotherThe CMSM Approach ◦ The CMSM technology is fully automated. ◦ The CMSM is software that makes the physical location of the data transparent ◦ The end user does not need to know where data is—in the data warehouse or on near-line storage.
    • 15. Moving Data fromOne Environment to AnotherA Data Warehouse Usage Monitor ◦ Streamline the operations of the CMSM environment ◦ Two types:  those that are supplied by the DBMS vendor  those supplied by third-party monitors
    • 16. Inverting the Data Warehouseinverteddata warehouse: Consider a normal data warehouse.To build a data warehouse: ◦ Normal way: put data first into disk storage  (after the data ages) near-line or archival storage ◦ Alternative way: first enter data into near-line storage (not disk storage)  data is “staged” from the near-line environment to the disk environment (to accessed and analyzed)  (after over) returned to near-line storage
    • 17. Total CostWith the introduction of near-line and archival storage, the growing costs of a data warehouse can be mitigated
    • 18. Maximum Capacity“XYZ machine can handle up to nnn terabytes of data.”Parameters measures the machines capacity: Volumes of data Number of users Workload complexityThe balanced case is where there is a fair amount of data, a fair number of users, and a reasonably complex workload
    • 19. SummaryData warehouses grow large explosivelyThe data inside the warehouse separates into one of two classes—frequently used data or infrequently used dataWithout near-line and/or archival storage, the costs of the data warehouseskyrocket as the data warehouse grows largehttp://it-slideshares.blogspot.com/

    ×