ECE7130: Advanced Computer Architecture: Storage Systems Dr. Xubin He http://iweb.tntech.edu/hexb Email: hexb@tntech.edu T...
Outline <ul><li>Quick overview: storage systems </li></ul><ul><li>RAID </li></ul><ul><li>Advanced Dependability/Reliabilit...
Storage Architectures 06/16/10
Disk Figure of Merit: Areal Density <ul><li>Bits recorded along a track </li></ul><ul><ul><li>Metric is  Bits Per Inch  ( ...
Historical Perspective <ul><li>1956 IBM Ramac — early 1970s Winchester </li></ul><ul><ul><li>Developed for mainframe compu...
Storage Media <ul><li>Magnetic: </li></ul><ul><ul><li>Hard drive </li></ul></ul><ul><li>Optical: </li></ul><ul><ul><li>CD ...
Driving forces Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
HDD Market Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
HDD Technology Trend Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
Errors <ul><li>Errors happens: </li></ul><ul><ul><li>Media cause: bit flip, noise… </li></ul></ul><ul><li>Data detection a...
Right HDD for right application Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
Interfaces <ul><li>Internal </li></ul><ul><ul><li>SCSI: wide SCSI, Ultra wide SCSI…  </li></ul></ul><ul><ul><li>IDE ATA: P...
SATA/SAS compatibility Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
Top HDD manufactures <ul><li>Western Digital </li></ul><ul><li>Seagate (also own Maxtor) </li></ul><ul><li>Fujitsu </li></...
Top companies to provide storage solutions <ul><li>Adaptec </li></ul><ul><li>EMC </li></ul><ul><li>Qlogic </li></ul><ul><l...
Fastest growing storage companies (07/2008) Source: storagesearch.com 06/16/10 Company Yearly growth Main product Coraid 3...
Solid State Drive <ul><li>A data storage device that uses solid-state memory to store persistent data.  </li></ul><ul><li>...
Advantages of SSD <ul><li>Faster startup: no spin-up </li></ul><ul><li>Fast random access for read </li></ul><ul><li>Extre...
Disadvantages of SSD <ul><li>Price: unit price of SSD is 20x of HDD </li></ul><ul><li>Limited write cycles (Flash SSD): 30...
Top SSD manufactures  (as of 1Q of 2008, source: storagesearch.com) 06/16/10 Rank Manufacturer SSD Technology 1 BitMICRO N...
Networked Storage <ul><li>Network Attached Storage(NAS) </li></ul><ul><li>Storage accessed over TCP/IP, using industry sta...
Advantages <ul><li>Consolidation  </li></ul><ul><li>Centralized Data Management  </li></ul><ul><li>Scalability  </li></ul>...
NAS and SAN shortcomings <ul><li>SAN Shortcomings --Data to desktop --Sharing between NT and UNIX --Lack of standards for ...
Organizations <ul><li>IEEE Computer Society Mass Storage Systems Technical Committee (MSSTC or TCMS):  http://www.msstc.or...
Conferences <ul><li>FAST: Usenix:  http://www.usenix.org/event/fast09/ </li></ul><ul><li>SC: IEEE/ACM:  http://sc08.superc...
Awards in Storage Systems <ul><li>IEEE Reynold B. Johnson Information Storage Systems Award </li></ul><ul><ul><li>Sponsore...
HPC storage challenge: SC <ul><li>HPC systems are comprised of 3 major subsystems: processing, networking and storage. In ...
Research hotspots <ul><li>Energy Efficient Storage: CMU, UIUC </li></ul><ul><li>Scalable Storages Meets Petaflops: IBM, UC...
Energy Efficient Storage <ul><li>Energy aware:  </li></ul><ul><ul><li>disk level: SSD </li></ul></ul><ul><ul><li>I/O and f...
Peta-scale scalable storage <ul><li>Challenges to using storages that can meet the ever increasing speed, multi-core desig...
My research in high performance and reliable storage systems <ul><li>Active/Active Service for High Availability Computing...
Improving disk performance. <ul><li>Use large sectors to improve bandwidth </li></ul><ul><li>Use track caches and read ahe...
Use Arrays of Small Disks? 14” 10” 5.25” 3.5” 3.5” Disk Array:  1 disk design Conventional:  4 disk  designs Low End High ...
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity  Volume  Power Data Rate  I/O ...
Array Reliability <ul><li>MTTF: Mean Time To Failure: average time that  a non - repairable component will operate before ...
Redundant  Arrays of (Inexpensive) Disks <ul><li>Replicate data over several disks so that no data will be lost if one dis...
 
Levels of RAID <ul><li>Original RAID paper described five categories (RAID levels 1-5).  (Patterson et al, “A case for red...
RAID 0: Nonredundant (JBOD) <ul><li>High I/O performance. </li></ul><ul><ul><li>Data is not save redundantly. </li></ul></...
Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing <ul><li>•  Each disk is fully duplicated onto its “...
RAID 2: Memory-Style ECC Data Disks Multiple ECC Disks and a Parity Disk <ul><li>Multiple disks record the ECC information...
RAID 3: Bit (Byte) Interleaved Parity <ul><li>Only need one parity disk  </li></ul><ul><li>Write/Read accesses all disks <...
RAID 3 <ul><li>Sum computed across recovery group to protect against hard disk failures, stored in P disk </li></ul><ul><l...
RAID 4: Block Interleaved Parity <ul><li>Blocks: striping units  </li></ul><ul><li>Allow for parallel access by multiple I...
Problems of Disk Arrays: Small Writes (read-modify-write procedure) D0 D1 D2 D3 P D0' + + D0' D1 D2 D3 P' new data old dat...
Inspiration for RAID 5 <ul><li>RAID 4 works well for small reads </li></ul><ul><li>Small writes (write to one disk):  </li...
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of inte...
Comparison of RAID Levels (N disks, each with capacity of C) Small write problem High performance, reliability (N-1)xC Int...
RAID 6: Recovering from 2 failures <ul><li>Why > 1 failure recovery? </li></ul><ul><ul><li>operator accidentally replaces ...
RAID 6: Recovering from 2 failures <ul><li>Network Appliance’s  row-diagonal parity  or  RAID-DP </li></ul><ul><li>Like th...
Example p = 5 <ul><li>Row diagonal parity starts by recovering one of the 4 blocks on the failed disk using diagonal parit...
Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage •  Disk Mirroring, Shadowing (RAI...
Upcoming SlideShare
Loading in...5
×

Download It

775

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
775
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
52
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • CS258 S99
  • Download It

    1. 1. ECE7130: Advanced Computer Architecture: Storage Systems Dr. Xubin He http://iweb.tntech.edu/hexb Email: hexb@tntech.edu Tel: 931-3723462, Brown Hall 319
    2. 2. Outline <ul><li>Quick overview: storage systems </li></ul><ul><li>RAID </li></ul><ul><li>Advanced Dependability/Reliability/Availability </li></ul><ul><li>I/O Benchmarks, Performance and Dependability </li></ul>
    3. 3. Storage Architectures 06/16/10
    4. 4. Disk Figure of Merit: Areal Density <ul><li>Bits recorded along a track </li></ul><ul><ul><li>Metric is Bits Per Inch ( BPI ) </li></ul></ul><ul><li>Number of tracks per surface </li></ul><ul><ul><li>Metric is Tracks Per Inch ( TPI ) </li></ul></ul><ul><li>Disk Designs Brag about bit density per unit area </li></ul><ul><ul><li>Metric is Bits Per Square Inch : Areal Density = BPI x TPI </li></ul></ul>
    5. 5. Historical Perspective <ul><li>1956 IBM Ramac — early 1970s Winchester </li></ul><ul><ul><li>Developed for mainframe computers, proprietary interfaces </li></ul></ul><ul><ul><li>Steady shrink in form factor: 27 in. to 14 in. </li></ul></ul><ul><li>Form factor and capacity drives market more than performance </li></ul><ul><li>1970s developments </li></ul><ul><ul><li>5.25 inch floppy disk formfactor (microcode into mainframe) </li></ul></ul><ul><ul><li>Emergence of industry standard disk interfaces </li></ul></ul><ul><li>Early 1980s: PCs and first generation workstations </li></ul><ul><li>Mid 1980s: Client/server computing </li></ul><ul><ul><li>Centralized storage on file server </li></ul></ul><ul><ul><ul><li>accelerates disk downsizing: 8 inch to 5.25 </li></ul></ul></ul><ul><ul><li>Mass market disk drives become a reality </li></ul></ul><ul><ul><ul><li>industry standards: SCSI, IPI, IDE </li></ul></ul></ul><ul><ul><ul><li>5.25 inch to 3.5 inch drives for PCs, End of proprietary interfaces </li></ul></ul></ul><ul><li>1900s: Laptops => 2.5 inch drives </li></ul><ul><li>2000s: What new devices leading to new drives? </li></ul>
    6. 6. Storage Media <ul><li>Magnetic: </li></ul><ul><ul><li>Hard drive </li></ul></ul><ul><li>Optical: </li></ul><ul><ul><li>CD </li></ul></ul><ul><li>Solid State Semiconductor: </li></ul><ul><ul><li>Flash SSD </li></ul></ul><ul><ul><li>RAM SSD </li></ul></ul><ul><li>MEMS-based </li></ul><ul><ul><li>Significantly cheaper than DRAM but much faster than traditional HDD, high density </li></ul></ul><ul><li>Tape: sequential accessed </li></ul>06/16/10
    7. 7. Driving forces Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
    8. 8. HDD Market Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
    9. 9. HDD Technology Trend Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
    10. 10. Errors <ul><li>Errors happens: </li></ul><ul><ul><li>Media cause: bit flip, noise… </li></ul></ul><ul><li>Data detection and decoding logic, ECC </li></ul>06/16/10
    11. 11. Right HDD for right application Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
    12. 12. Interfaces <ul><li>Internal </li></ul><ul><ul><li>SCSI: wide SCSI, Ultra wide SCSI… </li></ul></ul><ul><ul><li>IDE ATA: Parallel ATA (PATA): PATA 66/100/133 </li></ul></ul><ul><ul><li>Serial ATA (SATA): SATA 150, SATA II 300 </li></ul></ul><ul><ul><li>Serial Attached SCSI (SAS) </li></ul></ul><ul><li>External </li></ul><ul><ul><li>USB (1.0/2.0) </li></ul></ul><ul><ul><li>Firewire (400/800) </li></ul></ul><ul><ul><li>eSATA: external SATA </li></ul></ul>06/16/10
    13. 13. SATA/SAS compatibility Source: Anderson & Whittington, Seagate, FAST’07 06/16/10
    14. 14. Top HDD manufactures <ul><li>Western Digital </li></ul><ul><li>Seagate (also own Maxtor) </li></ul><ul><li>Fujitsu </li></ul><ul><li>Samsung </li></ul><ul><li>Hitachi </li></ul><ul><li>IBM used to make hard disk drives but sold that division to Hitachi. </li></ul>06/16/10
    15. 15. Top companies to provide storage solutions <ul><li>Adaptec </li></ul><ul><li>EMC </li></ul><ul><li>Qlogic </li></ul><ul><li>IBM </li></ul><ul><li>Hitachi </li></ul><ul><li>Brocade </li></ul><ul><li>Cisco </li></ul><ul><li>HP </li></ul><ul><li>Network appliance </li></ul><ul><li>Emulex </li></ul>06/16/10
    16. 16. Fastest growing storage companies (07/2008) Source: storagesearch.com 06/16/10 Company Yearly growth Main product Coraid 370% NAS ExaGrid Systems 337% Disk to disk backup RELDATA 300% iSCSI Voltaire 194% InfiniBand Transcend Information 166% SSD OnStor 157% NAS Bluepoint Data Storage 140% Online Backup and Storage Compellent 124% NAS Alacritech 100% iSCSI Intransa 100% iSCSI
    17. 17. Solid State Drive <ul><li>A data storage device that uses solid-state memory to store persistent data. </li></ul><ul><li>A SSD either uses Flash non-volatile memory (Flash SSD) or DRAM volatile memory (RAM SSD) </li></ul>06/16/10
    18. 18. Advantages of SSD <ul><li>Faster startup: no spin-up </li></ul><ul><li>Fast random access for read </li></ul><ul><li>Extremely low read/write latency: much smaller seek time </li></ul><ul><li>Quiet: no moving </li></ul><ul><li>High mechanical reliability: endure shock, vibration </li></ul><ul><li>Balanced performance across entire storage device </li></ul>06/16/10
    19. 19. Disadvantages of SSD <ul><li>Price: unit price of SSD is 20x of HDD </li></ul><ul><li>Limited write cycles (Flash SSD): 30-50 millions </li></ul><ul><li>Slower write speed (Flash SSD): erase blocks </li></ul><ul><li>Lower storage density </li></ul><ul><li>Vulnerable to some effects: abrupt power loss (RAM SSD), magnetic fields and electric/static charges </li></ul>06/16/10
    20. 20. Top SSD manufactures (as of 1Q of 2008, source: storagesearch.com) 06/16/10 Rank Manufacturer SSD Technology 1 BitMICRO Networks Flash SSD 2 STEC Flash SSD 3 Mtron Flash SSD 4 Memoright Flash SSD 5 SanDisk Flash SSD 6 Samsung Flash SSD 7 Adtron Flash SSD 8 Texas Memory Systems RAM/Flash SSD 9 Toshiba Flash SSD 10 Violin Memory RAM SSD
    21. 21. Networked Storage <ul><li>Network Attached Storage(NAS) </li></ul><ul><li>Storage accessed over TCP/IP, using industry standard file sharing protocols like NFS, HTTP, Windows Networking </li></ul><ul><ul><li>Provide File System Functionality </li></ul></ul><ul><ul><li>Take LAN bandwidth of Servers </li></ul></ul><ul><li>Storage Area Network(SAN) </li></ul><ul><li>Storage accessed over a Fibre Channel switching fabric, using encapsulated SCSI. </li></ul><ul><ul><li>Block level storage system </li></ul></ul><ul><ul><li>Fibre-Channel SAN </li></ul></ul><ul><ul><li>IP SAN </li></ul></ul><ul><ul><ul><li>Implementing SAN over well-known TCP/IP </li></ul></ul></ul><ul><ul><ul><li>iSCSI: Cost-effective, SCSI and TCP/IP </li></ul></ul></ul>06/16/10
    22. 22. Advantages <ul><li>Consolidation </li></ul><ul><li>Centralized Data Management </li></ul><ul><li>Scalability </li></ul><ul><li>Fault Resiliency </li></ul>06/16/10
    23. 23. NAS and SAN shortcomings <ul><li>SAN Shortcomings --Data to desktop --Sharing between NT and UNIX --Lack of standards for file access and locking </li></ul><ul><li>NAS Shortcomings --Shared tape resources --Number of drives --Distance to tapes/disks </li></ul><ul><ul><ul><li>NAS --Focuses on applications, users, and the files and data that they share </li></ul></ul></ul><ul><ul><ul><li>SAN --Focuses on disks, tapes, and a scalable, reliable infrastructure to connect them </li></ul></ul></ul><ul><ul><ul><li>NAS Plus SAN --The complete solution, from desktop to data center to storage device </li></ul></ul></ul>06/16/10
    24. 24. Organizations <ul><li>IEEE Computer Society Mass Storage Systems Technical Committee (MSSTC or TCMS): http://www.msstc.org </li></ul><ul><li>IEEE IETF: www.ietf.org </li></ul><ul><ul><li>IPS, IMSS </li></ul></ul><ul><li>International Committee for Information Technology Standards: www.incits.org </li></ul><ul><ul><li>Storage: B11, T10, T11, T13 </li></ul></ul><ul><li>Storage mailing list: storage [email_address] </li></ul><ul><li>INSIC: Information Storage Industry Consortium: www.insic.org </li></ul><ul><li>SNIA: Storage Networking Industry Association: http://www.snia.org </li></ul><ul><ul><li>Technical work groups </li></ul></ul>06/16/10
    25. 25. Conferences <ul><li>FAST: Usenix: http://www.usenix.org/event/fast09/ </li></ul><ul><li>SC: IEEE/ACM: http://sc08.supercomputing.org/ </li></ul><ul><li>MSST: IEEE: http:// storageconference.org / </li></ul><ul><ul><li>SNAPI, SISW, CMPD, DAPS </li></ul></ul><ul><li>NAS: IEEE: http://www.eece.maine.edu/iwnas/ </li></ul><ul><li>SNW: SNIA: Storage Networking World </li></ul><ul><li>SNIA Storage Developer Conference: http://www.snia.org/events/storage-developer2008/ </li></ul><ul><li>Other conferences with storage components: </li></ul><ul><ul><li>IPDPS, ICDCS, ICPP, AReS, PDCS, Mascots, HPDC, IPCCC, ccGRID… </li></ul></ul>06/16/10
    26. 26. Awards in Storage Systems <ul><li>IEEE Reynold B. Johnson Information Storage Systems Award </li></ul><ul><ul><li>Sponsored by: IBM Almaden Research Center </li></ul></ul><ul><li>2008 Recipient: </li></ul><ul><ul><li>2008 - ALAN J. SMITH, Professor University of California at Berkeley Berkeley, CA, USA  </li></ul></ul><ul><li>2007- Co-Recipients </li></ul><ul><ul><li>DAVE HITZ Executive Vice President and Co-Founder Network Appliance Sunnyvale, CA </li></ul></ul><ul><ul><li>JAMES LAU Chief Strategy Officer, Executive Vice President and Co-Founder Network Appliance Sunnyvale, CA </li></ul></ul><ul><li>2006 Recipient </li></ul><ul><ul><li>JAISHANKAR M. MENON Director of Storage Systems Architecture and Design IBM, San Jose, CA </li></ul></ul><ul><ul><li>&quot;For pioneering work in the theory and application of RAID storage systems.&quot; </li></ul></ul><ul><li>More: http://www.ieee.org/portal/pages/about/awards/sums/johnson.html </li></ul>06/16/10
    27. 27. HPC storage challenge: SC <ul><li>HPC systems are comprised of 3 major subsystems: processing, networking and storage. In different applications, any one of these subsystems can limit the overall system performance. The HPC Storage Challenge is a competition showcasing effective approaches using the storage subsystem, which is often the limiting system, with actual applications. </li></ul><ul><li>Participants must describe their implementations and present measurements of performance, scalability, and storage subsystem utilization. Judging will be based on these measurements as well as innovation and effectiveness; maximum size and peak performance are not the sole criteria. </li></ul><ul><li>Finalists will be chosen on the basis of submissions which are in the form of a proposal; submissions are encouraged to include reports of work in progress. Participants with access to either large or small HPC systems are encouraged to enter this challenge. </li></ul>06/16/10
    28. 28. Research hotspots <ul><li>Energy Efficient Storage: CMU, UIUC </li></ul><ul><li>Scalable Storages Meets Petaflops: IBM, UCSC </li></ul><ul><li>High availability and Reliable Storage Systems: UC Berkeley, CMU, TTU, UCSC </li></ul><ul><li>Object Storage (OSD): CMU, HUST, UCSC </li></ul><ul><li>Storage virtualization: CMU </li></ul><ul><li>Intelligent Storage Systems: UC Berkeley, CMU </li></ul>06/16/10
    29. 29. Energy Efficient Storage <ul><li>Energy aware: </li></ul><ul><ul><li>disk level: SSD </li></ul></ul><ul><ul><li>I/O and file system level: efficient memory, cache, networked storage </li></ul></ul><ul><ul><li>Application level: data layout </li></ul></ul><ul><li>Green storage initiative (GSI): SNIA </li></ul><ul><ul><li>Energy efficient storage networking solutions </li></ul></ul><ul><ul><li>Storage administrator and infrastructure </li></ul></ul>06/16/10
    30. 30. Peta-scale scalable storage <ul><li>Challenges to using storages that can meet the ever increasing speed, multi-core designs, and petaflop computing capabilities. </li></ul><ul><li>Latencies to disks are not keeping up with peta-scale computing. </li></ul><ul><li>Incentive approaches are needed. </li></ul>06/16/10
    31. 31. My research in high performance and reliable storage systems <ul><li>Active/Active Service for High Availability Computing </li></ul><ul><ul><li>Active/Active Metadata Service </li></ul></ul><ul><li>Networked Storage Systems </li></ul><ul><ul><li>STICS </li></ul></ul><ul><ul><li>iRAID </li></ul></ul><ul><li>A Unified Multiple-Level Cache for High Performance Storage Systems </li></ul><ul><ul><li>iPVFS </li></ul></ul><ul><ul><li>iCache </li></ul></ul><ul><li>Performance-Adaptive UDP for Bulk Data Transfer over Dedicated Network Links: PA-UDP </li></ul>06/16/10
    32. 32. Improving disk performance. <ul><li>Use large sectors to improve bandwidth </li></ul><ul><li>Use track caches and read ahead; </li></ul><ul><ul><li>Read entire track into on-controller cache </li></ul></ul><ul><ul><li>Exploit locality (improves both latency and BW) </li></ul></ul><ul><li>Design file systems to maximize locality </li></ul><ul><ul><li>Allocate files sequentially on disks (exploit track cache) </li></ul></ul><ul><ul><li>Locate similar files in same cylinder (reduce seeks) </li></ul></ul><ul><ul><li>Locate simlar files in near-by cylinders (reduce seek distance) </li></ul></ul><ul><li>Pack bits closer together to improve transfer rate and density. </li></ul><ul><li>Use a collection of small disks to form a large, high performance one--->disk array </li></ul><ul><li>Stripping data across multiple disks to allow parallel I/O, thus improving performance. </li></ul>06/16/10
    33. 33. Use Arrays of Small Disks? 14” 10” 5.25” 3.5” 3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End <ul><li>Katz and Patterson asked in 1987: </li></ul><ul><ul><li>Can smaller disks be used to close gap in performance between disks and CPUs? </li></ul></ul>
    34. 34. Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity Volume Power Data Rate I/O Rate MTTF Cost IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5&quot; 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K x70 23 GBytes 11 cu. ft. 1 KW 110 MB/s 3900 IOs/s ??? Hrs $150K Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability? 9X 3X 8X 6X
    35. 35. Array Reliability <ul><li>MTTF: Mean Time To Failure: average time that a non - repairable component will operate before experiencing failure. </li></ul><ul><li>Reliability of N disks = Reliability of 1 Disk ÷N </li></ul><ul><ul><li>50,000 Hours ÷ 70 disks = 700 hours </li></ul></ul><ul><li>Disk system MTTF: Drops from 6 years to 1 month! </li></ul><ul><li>Arrays without redundancy too unreliable to be useful! </li></ul><ul><li>Solution: redundancy. </li></ul>
    36. 36. Redundant Arrays of (Inexpensive) Disks <ul><li>Replicate data over several disks so that no data will be lost if one disk fails. </li></ul><ul><li>Redundancy yields high data availability </li></ul><ul><ul><li>Availability : service still provided to user, even if some components failed </li></ul></ul><ul><li>Disks will still fail </li></ul><ul><li>Contents reconstructed from data redundantly stored in the array </li></ul><ul><ul><li> Capacity penalty to store redundant info </li></ul></ul><ul><ul><li> Bandwidth penalty to update redundant info </li></ul></ul>
    37. 38. Levels of RAID <ul><li>Original RAID paper described five categories (RAID levels 1-5). (Patterson et al, “A case for redundant arrays of inexpensive disks (RAID)”, ACM SIGMOD, 1988) </li></ul><ul><li>Disk striping with no redundant now is called RAID0 or JBOD(Just a bunch of disks). </li></ul><ul><li>Other kinds have been proposed in literature, </li></ul><ul><li>Level 6 (P+Q Redundancy), Level 10, RAID53, etc. </li></ul><ul><li>Except RAID0, all the RAID levels trade disk capacity for reliability, and the extra reliability makes parallism a practical way to improve performance. </li></ul>
    38. 39. RAID 0: Nonredundant (JBOD) <ul><li>High I/O performance. </li></ul><ul><ul><li>Data is not save redundantly. </li></ul></ul><ul><ul><li>Single copy of data is striped across multiple disks. </li></ul></ul><ul><li>Low cost. </li></ul><ul><ul><li>Lack of redundancy. </li></ul></ul><ul><li>Least reliable: single disk failure leads to data loss. </li></ul>file data block 1 block 0 block 2 block 3 Disk 1 Disk 0 Disk 2 Disk 3
    39. 40. Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing <ul><li>•  Each disk is fully duplicated onto its “ mirror ” </li></ul><ul><li>Very high availability can be achieved </li></ul><ul><li>• Bandwidth sacrifice on write: </li></ul><ul><li>Logical write = two physical writes </li></ul><ul><ul><li>• Reads may be optimized, minimize the queue and disk search time </li></ul></ul><ul><li>• Most expensive solution: 100% capacity overhead </li></ul>recovery group Targeted for high I/O rate , high availability environments
    40. 41. RAID 2: Memory-Style ECC Data Disks Multiple ECC Disks and a Parity Disk <ul><li>Multiple disks record the ECC information to determine which disk is in fault </li></ul><ul><li>A parity disk is then used to reconstruct corrupted or lost data </li></ul><ul><li>Needs log 2 (number of disks) redundancy disks </li></ul>f 0 (b) b 2 b 1 b 0 b 3 f 1 (b) P(b)
    41. 42. RAID 3: Bit (Byte) Interleaved Parity <ul><li>Only need one parity disk </li></ul><ul><li>Write/Read accesses all disks </li></ul><ul><li>Only one request can be serviced at a time </li></ul><ul><li>Easy to implement </li></ul><ul><li>Provides high bandwidth but not high I/O rates </li></ul>Targeted for high bandwidth applications: Multimedia, Image Processing 10010011 11001101 10010011 . . . Logical record 1 0 0 1 0 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 1 0 Striped physical records P Physical record
    42. 43. RAID 3 <ul><li>Sum computed across recovery group to protect against hard disk failures, stored in P disk </li></ul><ul><li>Logically, a single high capacity, high transfer rate disk: good for large transfers </li></ul><ul><li>Wider arrays reduce capacity costs, but decreases availability </li></ul><ul><li>12.5% capacity cost for parity in this configuration </li></ul><ul><li>RAID 3 relies on parity disk to discover errors on Read </li></ul><ul><li>But every sector has an error detection field </li></ul><ul><li>Rely on error detection field to catch errors on read, not on the parity disk </li></ul><ul><li>Allows independent reads to different disks simultaneously </li></ul>Inspiration for RAID 4
    43. 44. RAID 4: Block Interleaved Parity <ul><li>Blocks: striping units </li></ul><ul><li>Allow for parallel access by multiple I/O requests, high I/O rates </li></ul><ul><li>Doing multiple small reads is now faster than before. (allows small read requests to be restricted to a single disk). </li></ul><ul><li>Large writes(full stripe), update the parity: </li></ul><ul><li>P’ = d0’ + d1’ + d2’ + d3’; </li></ul><ul><li>Small writes(eg. write on d0), update the parity: </li></ul><ul><li>P = d0 + d1 + d2 + d3 </li></ul><ul><li>P’ = d0’ + d1 + d2 + d3 = P + d0’ + d0; </li></ul><ul><li>However, writes are still very slow since the parity </li></ul><ul><li>disk is the bottleneck. </li></ul>block 0 block 4 block 8 block 12 block 1 block 5 block 9 block 13 block 2 block 6 block 10 block 14 block 3 block 7 block 11 block 15 P(0-3) P(4-7) P(8-11) P(12-15)
    44. 45. Problems of Disk Arrays: Small Writes (read-modify-write procedure) D0 D1 D2 D3 P D0' + + D0' D1 D2 D3 P' new data old data old parity XOR XOR (1. Read) (2. Read) (3. Write) (4. Write) RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes
    45. 46. Inspiration for RAID 5 <ul><li>RAID 4 works well for small reads </li></ul><ul><li>Small writes (write to one disk): </li></ul><ul><ul><li>Option 1: read other data disks, create new sum and write to Parity Disk </li></ul></ul><ul><ul><li>Option 2: since P has old sum, compare old data to new data, add the difference to P </li></ul></ul><ul><li>Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk. Parity disk must be updated for every write operation! </li></ul>D0 D1 D2 D3 P D4 D5 D6 P D7
    46. 47. Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of interleaved parity D0 D1 D2 D3 P D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns Increasing Logical Disk Addresses Example: write to D0, D5 uses disks 0, 1, 3, 4
    47. 48. Comparison of RAID Levels (N disks, each with capacity of C) Small write problem High performance, reliability (N-1)xC Interleave parity 5 Write-related bottleneck High redundancy and better performance (N-1)xC Block-level parity 4 Low performance Easy to implement, high error recoverability (N-1)xC Bit(byte)-level parity 3 cost High performance, fastest write (N/2)xC Mirroring 1 No redundancy Maximum data transfer rate and sizes NxC Striping 0 Disadvantage Advantage Capacity Technique RAID level
    48. 49. RAID 6: Recovering from 2 failures <ul><li>Why > 1 failure recovery? </li></ul><ul><ul><li>operator accidentally replaces the wrong disk during a failure </li></ul></ul><ul><ul><li>since disk bandwidth is growing more slowly than disk capacity, the MTT Repair a disk in a RAID system is increasing  increases the chances of a 2nd failure during repair since takes longer </li></ul></ul><ul><ul><li>reading much more data during reconstruction meant increasing the chance of an uncorrectable media failure, which would result in data loss </li></ul></ul>
    49. 50. RAID 6: Recovering from 2 failures <ul><li>Network Appliance’s row-diagonal parity or RAID-DP </li></ul><ul><li>Like the standard RAID schemes, it uses redundant space based on parity calculation per stripe </li></ul><ul><li>Since it is protecting against a double failure, it adds two check blocks per stripe of data. </li></ul><ul><ul><li>If p+1 disks total, p-1 disks have data; assume p=5 </li></ul></ul><ul><li>Row parity disk is just like in RAID 4 </li></ul><ul><ul><li>Even parity across the other 4 data blocks in its stripe </li></ul></ul><ul><li>Each block of the diagonal parity disk contains the even parity of the blocks in the same diagonal </li></ul>
    50. 51. Example p = 5 <ul><li>Row diagonal parity starts by recovering one of the 4 blocks on the failed disk using diagonal parity </li></ul><ul><ul><li>Since each diagonal misses one disk, and all diagonals miss a different disk, 2 diagonals are only missing 1 block </li></ul></ul><ul><li>Once the data for those blocks is recovered, then the standard RAID recovery scheme can be used to recover two more blocks in the standard RAID 4 stripes </li></ul><ul><li>Process continues until two failed disks are restored </li></ul>0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 4 3 2 1 0 Diagonal Parity Row Parity Data Disk 3 Data Disk 2 Data Disk 1 Data Disk 0
    51. 52. Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage • Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its &quot;shadow&quot; Logical write = two physical writes 100% capacity overhead • Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk • High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes 1 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×