04.01 file organization


Published on

Storage Devices and its Characteristics, Storage Hierarchy, Memory Hierarchy, Digital data devices

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

04.01 file organization

  1. 1. DBMSFile Organization for Conventional DBMSBishal Ghimirebishal.ghimire@gmail.com
  2. 2. Storage Devices and its Characteristics
  3. 3. Storage Hierarchy• Primary Storage is the top level and is made up ofCPU registers, CPU cache and memory which arethe only components that are directly accessible tothe systems CPU. The CPU can continuously readdata stored in these areas and execute allinstructions as required quickly in a uniformmanner. Secondary Storage differs from primarystorage in that it is not directly accessible by theCPU. A system uses input/output (I/O) channels toconnect to the secondary storage which control thedata flow through a system when required and onrequest
  4. 4. Storage Hierarchy• Secondary storage is non-volatile so does notlose data when it is powered down soconsequently modern computer systems tend tohave a more secondary storage than primarystorage. All secondary storage today consist ofhard disk drives (HDD), usually set up in a RAIDconfiguration, however older installations alsoincluded removable media such us magnetooptical or MO
  5. 5. Storage Hierarchy• Tertiary Storage is mainly used as backup andarchival of data and although based on theslowest devices can be classed as the mostimportant in terms of data protection against avariety of disasters that can affect an ITinfrastructure. Most devices in this segment areautomated via robotics and software to reducemanagement costs and risk of human error andconsist primarily of disk & tape based back updevices
  6. 6. Storage Hierarchy• Offline Storage is the final category and iswhere removable types of storage media sit suchas tape cartridges and optical disc such as CDand DVD. Offline storage is can be used totransfer data between systems but also allow fordata to be secured offsite to ensure companiesalways have a copy of valuable data in the eventof a disaster.
  7. 7. Memory Hierarchy
  8. 8. Digital data devices
  9. 9. Hard disk internal structure
  10. 10. Hard disk internal structure
  11. 11. Hard disk internal structure
  12. 12. Checksum
  13. 13. Checksum• Checksums are used to ensure the integrity ofdata portions for data transmission or storage. Achecksum is basically a calculated summary ofsuch a data portion.• Network data transmissions often produceerrors, such as toggled, missing or duplicatedbits.• Some checksum algorithms are able to recover(simple) errors by calculating where theexpected error must be and repairing it.
  14. 14. Parity Bits
  15. 15. Disk Subsystem• Multiple disks connected to a computer systemthrough a controller▫ Controllers functionality (checksum, bad sectorremapping) oftencarried out by individual disks;reduces load on controller• Disk interface standards families▫ ATA(AT adaptor) range of standards▫ SATA(Serial ATA)▫ SCSI(Small Computer System Interconnect) rangeof standards▫ SAS(Serial Attached SCSI)▫ Several variants of each standard (different speedsand capabilities)
  16. 16. Disk Subsystem• Disks usually connected directly to computer system• In Storage Area Networks (SAN), a large number of disks areconnected by a high-speed network to a number of servers• In Network Attached Storage (NAS) networked storageprovides a file system interface using networked file systemprotocol, instead of providing a disk system interface
  17. 17. RAID - redundant array of independentdisks• RAID is short for redundant arrayof independent (or inexpensive) disks. It is acategory of disk drives that employ two or moredrives in combination for fault tolerance andperformance. RAID disk drives are usedfrequently on servers but arent generallynecessary for personal computers. RAID allowsyou to store the same data redundantly (inmultiple paces) in a balanced way to improveoverall storage performance.
  18. 18. • Level 0: Striped Disk Array without Fault ToleranceProvides data striping(spreading out blocks of eachfile across multiple disk drives) but no redundancy.This improves performance but does not deliverfault tolerance. If one drive fails then all data in thearray is lost.• Level 1: Mirroring and DuplexingProvides disk mirroring. Level 1 provides twice theread transaction rate of single disks and the samewrite transaction rate as single disks.• Level 2: Error-Correcting CodingNot a typical implementation and rarely used, Level2 stripes data at the bit level rather than the blocklevel.
  19. 19. • Level 3: Bit-Interleaved ParityProvides byte-level striping with a dedicated paritydisk. Level 3, which cannot service simultaneousmultiple requests, also is rarely used.• Level 4: Dedicated Parity DriveA commonly used implementation of RAID, Level 4provides block-level striping (like Level 0) with aparity disk. If a data disk fails, the parity data is usedto create a replacement disk. A disadvantage toLevel 4 is that the parity disk can create writebottlenecks.• Level 5: Block Interleaved Distributed ParityProvides data striping at the byte level and alsostripe error correction information. This results inexcellent performance and good fault tolerance.Level 5 is one of the most popular implementationsof RAID.
  20. 20. Performance Measures of Disks• Access time: the time from when a read or write request is issuedto when data transfer begins. To access data on a given sector of adisk, the arm first must move so that it is positioned over the correcttrack, and then must wait for the sector to appear under it as thedisk rotates. The time for repositioning the arm is called seek time,and it increases with the distance the arm must move. Typical seektime range from 2 to 30 milliseconds. Average seek time is theaverage of the seek time, measured over a sequence of (uniformlydistributed) random requests, and it is about one third of the worst-case seek time.• Once the seek has occurred, the time spent waiting for the sector tobe accesses to appear under the head is called rotational latencytime. Average rotational latency time is about half of the time for afull rotation of the disk. (Typical rotational speeds of disks rangesfrom 60 to 120 rotations per second).• The access time is then the sum of the seek time and the latency andranges from 10 to 40 milli-sec.
  21. 21. Performance Measures of Disks• data transfer rate, the rate at which data can beretrieved from or stored to the disk. Current disksystems support transfer rate from 1 to 5megabytes per second.• reliability, measured by the mean time tofailure. The typical mean time to failure of diskstoday ranges from 30,000 to 800,000 hours(about 3.4 to 91 years).
  22. 22. Optimization of Disk-Block Access• Data is transferred between disk and mainmemory in units called blocks.• A block is a contiguous sequence of bytes froma single track of one platter.• Block sizes range from 512 bytes to severalthousand.• The lower levels of file system manager covertblock addresses into the hardware-level cylinder,surface, and sector number
  23. 23. • Access to data on disk is several orders of magnitudeslower than is access to data in main memory.Optimization techniques besides buffering of blocksin main memory. Scheduling: If several blocksfrom a cylinder need to be transferred, we may savetime by requesting them in the order in which theypass under the heads. A commonly used disk-armscheduling algorithm is the elevator algorithm.• File organization. Organize blocks on disk in away that corresponds closely to the manner that weexpect data to be accessed. For example, storerelated information on the same track, or physicallyclose tracks, or adjacent cylinders in order tominimize seek time. IBM mainframe OSs provideprogrammers fine control on placement of files butincrease programmers burden.
  24. 24. • Nonvolatile write buffers. Use nonvolatileRAM (such as battery-back-up RAM) to speedup disk writes drastically (first write tononvolatile RAM buffer and inform OS thatwrites completed).• Log disk. Another approach to reducing writelatency is to use a log disk, a disk devoted towriting a sequential log. All access to the log diskis sequential, essentially eliminating seek time,and several consecutive blocks can be written atonce, making writes to log disk several timesfaster than random writes.
  25. 25. Optical Disks