Your SlideShare is downloading. ×
Storage Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Storage Systems

547
views

Published on

Lecture on Storage Systems …

Lecture on Storage Systems
http://www.rust-class.org

Engineering tradeoffs in cost, latency, and robustness in storing data

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
547
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 12 November 2013 University of Virginia cs4414 1
  • 2. Why is storage complicated? 12 November 2013 University of Virginia cs4414 2
  • 3. Delay Lines 12 November 2013 University of Virginia cs4414 3
  • 4. Mercury Delay Lines 0/1 12 November 2013 University of Virginia cs4414 4
  • 5. 12 November 2013 University of Virginia cs4414 5
  • 6. 12 November 2013 University of Virginia cs4414 6
  • 7. Speed of Sound Air 343 m/s Mercury 1450 m/s (40° C) Water 1500 m/s (25° C) 12 November 2013 Why Mercury? 0/1 University of Virginia cs4414 7
  • 8. Magnetic Core Memory MIT Project Whirlwind, 1951 2K 16-bit words with “no waiting”! 12 November 2013 University of Virginia cs4414 8
  • 9. SRAM NOT NOT 12 November 2013 University of Virginia cs4414 9
  • 10. Four-Transistor SRAM Bit 12 November 2013 University of Virginia cs4414 10
  • 11. Modern DRAM 12 November 2013 University of Virginia cs4414 11
  • 12. After Turning off Power 5 seconds 12 November 2013 30 seconds University of Virginia cs4414 5 minutes 12
  • 13. cycles (at 800MHz) to read a particular row = 13.75ns = 185° F 12 November 2013 University of Virginia cs4414 13
  • 14. Storage Systems Device Mercury (Gin) Delay Line Example UNIVAC (1951) Time to Access 220,000ns (average) Cost per Bit $ 0.38 (1968) (a bazillion n$) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ UNIVAC 1968 (Core memory): $823,500 for 131 K 16-bit words 12 November 2013 University of Virginia cs4414 14
  • 15. Cheaper, More Persistent Storage 12 November 2013 University of Virginia cs4414 15
  • 16. How big is a TB? 12 November 2013 University of Virginia cs4414 16
  • 17. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB ? 0.0046 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 17
  • 18. Accessing a Hard Drive “seek time” ~ 0.1ms rotate time: 1/5900rpm ~ max 10ms 12 November 2013 University of Virginia cs4414 5900 rpm spindle 18
  • 19. Passing the Drop Test 12 November 2013 University of Virginia cs4414 19
  • 20. Passing the Drop Test 12 November 2013 University of Virginia cs4414 20
  • 21. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 5ms (ave) 0.0046 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 21
  • 22. Storage Abstractions 12 November 2013 University of Virginia cs4414 22
  • 23. 12 November 2013 University of Virginia cs4414 23
  • 24. 12 November 2013 University of Virginia cs4414 24
  • 25. Storage Abstractions Memory Location File Do we really need both? 12 November 2013 University of Virginia cs4414 What about: database, URI? 25
  • 26. Unix File Abstraction 12 November 2013 University of Virginia cs4414 26
  • 27. Which are files? class24.pptx /Users/dave/OS/classes/ OS-provided random numbers 12 November 2013 University of Virginia cs4414 27
  • 28. “Everything is a File” class24.pptx /mnt/cdrom /Users/dave/OS/classes/ OS-provided random numbers /dev/tty0 /dev/random 12 November 2013 University of Virginia cs4414 28
  • 29. inode represents a file Size of File (bytes) Device ID User ID Group ID File Mode (permission bits) Link count (number of hard links to node) … Diskmap 12 November 2013 University of Virginia cs4414 29
  • 30. include/linux/fs.h 12 November 2013 University of Virginia cs4414 30
  • 31. Size of File (bytes) Device ID User ID Group ID stat File Mode (permission bits) Link count (number of hard links to node) … Diskmap > stat -x class24.pptx File: "class24.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 1 Access: Wed Nov 20 15:00:41 2013 Modify: Wed Nov 20 14:23:13 2013 Change: Wed Nov 20 14:23:13 2013 12 November 2013 University of Virginia cs4414 31
  • 32. > ln class24.pptx todays-class.pptx > stat -x class24.pptx File: "class24.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 2 Access: .. > stat -x todays-class.pptx File: "todays-class.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 2 > rm class24.pptx > stat -x class24.pptx stat: class24.pptx: stat: No such file or directory > stat -x todays-class.pptx File: "todays-class.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 1 12 November 2013 University of Virginia cs4414 32
  • 33. Removing a linked file like this is very confusing for PowerPoint… 12 November 2013 University of Virginia cs4414 33
  • 34. Size of File (bytes) Diskmap (Unix System 5) Device ID User ID Group ID File Mode (permission bits) 0 Link count (number of hard links to node) … 1 2 Diskmap … 9 10 Disk Block (1K bytes) Disk Block (1K bytes) 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 34
  • 35. Diskmap (Unix System 5) 0 1 Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) Indirect Disk Block (1K bytes) 4 bytes for each = 256 pointers 2 … 9 10 Disk Block (1K bytes) Disk Block (1K bytes) 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 35
  • 36. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 University of Virginia cs4414 36
  • 37. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 How would you determine if your file system has this structure? University of Virginia cs4414 37
  • 38. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 38
  • 39. Directories are Files Too! Filename Inode . .. .DS_Store 494211 494205 494212 class0 class1 class10 class11 … class19 class2 … November 2013 12 6565946 6565826 1467012 2252968 … 5649155 494218 … University of Virginia cs4414 ls -ali 39
  • 40. > brew install tree # needed on MacOS X, but builtin to most Unixes 12 November 2013 University of Virginia cs4414 40
  • 41. How to create a new file? 12 November 2013 University of Virginia cs4414 41
  • 42. Finding a Free Block Data 0 1 … I-List (inodes) 98 99 0 1 … 98 99 Superblock List of free disk blocks Boot block 12 November 2013 Not to scale! University of Virginia cs4414 42
  • 43. Finding a Free inode Data 0 1 2 3 … I-List (inodes) Superblock Boot block 12 November 2013 0 1 0 0 … Superblock keeps a cache of free inodes Not to scale! University of Virginia cs4414 43
  • 44. Modern File Systems 12 November 2013 University of Virginia cs4414 44
  • 45. What should a modern file system do that Unix S5FS doesn’t? 12 November 2013 University of Virginia cs4414 45
  • 46. Handling Failures ZFS Developed for Solaris, 2005 Now open source: http://open-zfs.org/ “MacZFS is free data storage and protection software for all Mac OS users. It's for people who have Mac OS, who have any data, and who really like their data. Whether on a single-drive laptop or on a massive server, it'll store your petabytes with ragingly redundant RAID reliability, and it'll keep the bit-rotted bleeps and bloops out of your iTunes library.” 12 November 2013 University of Virginia cs4414 46
  • 47. Block Checksums 0 Checksum Block (SHA-256) 0 40a3dc… 1 1 2c5829d… 2 2 955d253 … … … 9 Disk Block (1K bytes) 10 … ZFS 11 12 S5FS 12 November 2013 How do you check the checksums? University of Virginia cs4414 47
  • 48. Hashing the Hashes Hash(B1) Hash(B2) Hash(B2) Hash(B2) Block 1 Block 2 Block 3 Block 4 12 November 2013 University of Virginia cs4414 48
  • 49. Merkle Tree Ralph Merkle Hash(B1) Hash(B2) Hash(B2) Hash(B2) Block 1 Block 2 Block 3 Block 4 12 November 2013 University of Virginia cs4414 49
  • 50. Recovery Copy 1 One Copy Copy 2 Keep 2 copies of every block: if checksum fails for first copy read, try reading second copy. 12 November 2013 copies = 2 University of Virginia cs4414 50
  • 51. For the truly paranoid… Copy 1 One Copy Copy 2 Copy 3 copies = 3 12 November 2013 University of Virginia cs4414 51
  • 52. For the fairly paranoid but cheap… RAID Redundant Arrays of Inexpensive Disks ACM SIGMOD 1988 whitehouse.gov 12 November 2013 University of Virginia cs4414 52
  • 53. Case for RAID 12 November 2013 University of Virginia cs4414 53
  • 54. 12 November 2013 University of Virginia cs4414 54
  • 55. Redundancy 12 November 2013 University of Virginia cs4414 55
  • 56. 12 November 2013 University of Virginia cs4414 56
  • 57. Improving Performance Cache (64MB DRAM) Adaptive Replacement Cache 12 November 2013 University of Virginia cs4414 57
  • 58. Adaptive Replacement Cache Blocks in Cache Accessed Again T1: Recent Cache Entries T2: Frequently-Used Blocks “Ghost” Entries Size of T1 adapts B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU) How should relative size of T1 and T2 be adjusted? 12 November 2013 University of Virginia cs4414 58
  • 59. Adaptive Replacement Cache Blocks in Cache Accessed Again T1: Recent Cache Entries T2: Frequently-Used Blocks “Ghost” Entries Size of T1 adapts B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU) Hit in B1: should increase size of T1, drop entry from T2 to B2 Hit in B2: should increase size of T2, drop entry from T1 to B1 12 November 2013 University of Virginia cs4414 59
  • 60. IBM Almaden Research Center 12 November 2013 University of Virginia cs4414 60
  • 61. Do you actually have a disk like this on your main computing device? Cache (64MB DRAM) 12 November 2013 University of Virginia cs4414 61
  • 62. Flash Memory Solid State Drive 12 November 2013 University of Virginia cs4414 62
  • 63. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 5,000,000ns 0.0046 n$ SSD Samsung 500GB ($300) ? 0.075 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 63
  • 64. 12 November 2013 University of Virginia cs4414 64
  • 65. 12 November 2013 University of Virginia cs4414 65
  • 66. 12 November 2013 University of Virginia cs4414 66
  • 67. Storage Systems Device Modern Hard Drive Mercury (Gin) Delay Line Example Time to Access UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) SSD Samsung ~10,000 ns 500GB ($300) (for random read) Disk Drive 12 November 2013 Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 13.75ns 5,000,000ns University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 1.16 n$ 0.075 n$ 0.0046 n$ 67
  • 68. Storage systems should be designed around hardware capabilities and workload Today’s OSes mostly use filesystems designed around 1990s disks and 1960s workloads! But, with lots of clever hacks to make them work okay on today’s hardware and workloads 12 November 2013 University of Virginia cs4414 Charge More from Wilkes 1967: 68