Your SlideShare is downloading. ×
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Storage: Alternate Futures
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Storage: Alternate Futures

207

Published on

1999年的老数据了,但理论没变

1999年的老数据了,但理论没变

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
207
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Storage: Alternate Futures Yotta Jim Gray Microsoft Research Zetta Research.Micrsoft.com/~Gray/talks Exa NetStore ’99 Seattle WA, 14 Oct 1999 Peta Tera Giga Mega 1 Kilo
  • 2. Acknowledgments: Thank You!!• Dave Patterson: – Convinced me that processors are moving to the devices.• Kim Keeton and Erik Riedell – Showed that many useful subtasks can be done by disk-processors, and quantified execution interval• Remzi Dusseau – Re-validated Amdhl’s laws 2
  • 3. Outline• The Surprise-Free Future (5 years) – 500 mips cpus for 10$ – 1 Gb RAM chips – MAD at 50 Gbpsi – 10 GBps SANs are ubiquitous – 1 GBps WANs are ubiquitous• Some consequences – Absurd (?) consequences. – Auto-manage storage – Raid10 replaces Raid5 – Disc-packs – Disk is the archive media of choice• A surprising future? – Disks (and other useful things) become supercomputers. – Apps run “in the disk” 3
  • 4. The Surprise-free Storage Future• 1 Gb RAM chips• MAD at 50 Gbpsi• Drives shrink one quantum• Standard IO• 10 GBps SANs are ubiquitous• 1 Gbps WANs are ubiquitous• 5 tips cpus for 1K$ and 500 mips cpus for 10$ 4
  • 5. 1 Gb RAM Chips• Moving to 256 Mb chips now• 1Gb will be “standard” in 5 years, 4 Gb will be premium product.• Note: – 256Mb = 32MB: the smallest memory – 1 Gb = 128 MB: the smallest memory 5
  • 6. MAD at 50 Gbpsi• MAD: Magnetic Aerial Density: 3-10 Mbpsi in products 20 Mbpsi in lab 50 Mbpsi = paramagnetic limit but…. People have ideas.• Capacity: rise 10x in 5 years (conservative)• Bandwidth: rise 4x in 5 years (density+rpm)• Disk: 50GB to 500 GB, • 60-80MBps • 1k$/TB • 15 minute to 3 hour scan time. 6
  • 7. Disk vs Tape• Disk • Tape – 47 GB – 40 GB – 15 MBps – 5 MBps – 10 ms seek time – 30 sec pick time – 5 ms rotate time – Many minute seek time – 9$/GB for drive – 5$/GB for media Guestimates Cern: 200 TB 3$/GB for ctlrs/cabinet 10$/GB for drive+library 3480 tapes – 4 TB/rack – 10 TB/rack 2 col = 50GB Rack = 1 TB =20 drives The price advantage of tape is narrowing, and the performance advantage of disk is growing 7
  • 8. System On A Chip• Integrate Processing with memory on one chip – chip is 75% memory now – 1MB cache >> 1960 supercomputers – 256 Mb memory chip is 32 MB! – IRAM, CRAM, PIM,… projects abound• Integrate Networking with processing on one chip – system bus is a kind of network – ATM, FiberChannel, Ethernet,.. Logic on chip. – Direct IO (no intermediate bus)• Functionally specialized cards shrink to a chip. 8
  • 9. 500 mips System On A Chip for 10$• 486 now 7$ 233 Mhz ARM for 10$ system on a chip http://www.cirrus.com/news/products99/news-product14.html AMD/ Celeron 266 ~ 30$• In 5 years, today’s leading edge will be – System on chip (cpu, cache, mem ctlr, multiple IO) – Low cost – Low-power – Have integrated IO• High end is 5 BIPS cpus 9
  • 10. Standard IO in 5 Years• Probably• Replace PCI with something better will still need a mezzanine bus standard• Multiple serial links directly from processor• Fast (10 GBps/link) for a few meters• System Area Networks (SANS) ubiquitous (VIA morphs to SIO?) 10
  • 11. Ubiquitous 10 GBps SANsin 5 years• 1Gbps Ethernet are reality now. – Also FiberChannel ,MyriNet, GigaNet, ServerNet,, ATM,…• 10 Gbps x4 WDM deployed now (OC192) 1 GBps – 3 Tbps WDM working in lab• In 5 years, expect 10x, progress is astonishing• Gilder’s law: Bandwidth grows 3x/year 120 MBps http://www.forbes.com/asap/97/0407/090.htm (1Gbps) 80 MBps 40 MBps 20 Mbsp 11 5 MBps
  • 12. Thin Client’s mean HUGE servers• AOL hosting customer pictures• Hotmail allows 5 MB/user, 50 M users• Web sites offer electronic vaulting for SOHO.• IntelliMirror: replicate client state on server• Terminal server: timesharing returns• …. Many more. 12
  • 13. Standard Storage Metrics• Capacity: – RAM: MB and $/MB: today at 512MB and 3$/MB – Disk: GB and $/GB: today at 50GB and 10$/GB – Tape: TB and $/TB: today at 50GB and 12k$/TB (nearline)• Access time (latency) – RAM: 100 ns – Disk: 10 ms – Tape: 30 second pick, 30 second position• Transfer rate – RAM: 1 GB/s – Disk: 15 MB/s - - - Arrays can go to 1GB/s – Tape: 5 MB/s - - - striping is problematic, but “works” 13
  • 14. New Storage Metrics: Kaps, Maps, SCAN?• Kaps: How many kilobyte objects served per second – The file server, transaction processing metric – This is the OLD metric.• Maps: How many megabyte objects served per second – The Multi-Media metric• SCAN: How long to scan all the data – the data mining and utility metric• And – Kaps/$, Maps/$, TBscan/$ 14
  • 15. For the Record (good 1999 devices packaged in system http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf) DRAM DISK TAPE robotUnit capacity (GB) 1 9 40 X 100 Unit price $ 5000 900 20000 $/GB 3300 12 12 Latency (s) 1.E-7 2.E-3 3.E+1Bandwidth (MBps) 1000 15 20 Kaps 9.E+5 6.E+2 3.E-2 Maps 1.E+3 14.67 3.E-2 Scan time (s/TB) 1 600 24500 $/Kaps 6.E-11 1.E-8 6.E-3 $/Maps 5.E-8 6.E-7 6.E-3 $/TBscan $0.05 $1 $129 15 Tape is 1Tb with 4 DLT readers at 5MBps each.
  • 16. For the Record(good 1999 devices packaged in system ) http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf1.E+7 DRAM1.E+5 DISK1.E+3 TAPE1.E+1 1.E-1 B) s ps s an s ap ap ap 1.E-3 /T sc Ka M M (s K TB $/ $/ e 1.E-5 ti m $/ an 1.E-7 Sc 1.E-91.E-11 16 Tape is 1Tb with 4 DLT readers at 5MBps each.
  • 17. The Access Time Myth• The Myth: seek or pick time dominates• The reality: (1) Queuing dominates• (2) Transfer dominates BLOBs Wait• (3) Disk seeks often short• Implication: many cheap servers better than one fast expensive server – shorter queues Transfer Transfer – parallel transfer Rotate – lower cost/access and cost/byte Rotate• This is obvious for disk arrays Seek Seek• This even more obvious for tape arrays 17
  • 18. Storage Ratios Changed • 10x better access time • DRAM/disk media price ratio changed • 10x more bandwidth – 1970-1990 100:1 • 4,000x lower media price – 1990-1995 10:1 – 1995-1997 50:1 – today ~ 0.1$pMB disk 30:1 3$pMB dram Disk Performance vs Time Disk accesses/second Storage Price vs Time vs Time Megabytes per kilo-dollar 100 10. 100 10,000.seeks per secondbandwidth: MB/s Accesses per Second 1,000. Capacity (GB) 100. 10 1. MB/k$ 10 10. 1. 1 0.1 1 0.1 1980 1990 2000 1980 1990 2000 1980 1990 18 2000 Year Year Year
  • 19. Data on DiskCan Move to RAM in 8 years Storage Price vs Time Megabytes per kilo-dollar 10,000. 1,000. 100. MB/k$30:1 10. 6 years 1. 0.1 1980 1990 2000 Year 19
  • 20. Outline• The Surprise-Free Future (5 years) – 500 mips cpus for 10$ – 1 Gb RAM chips – MAD at 50 Gbpsi – 10 GBps SANs are ubiquitous – 1 GBps WANs are ubiquitous• Some consequences – Absurd (?) consequences. – Auto-manage storage – Raid10 replaces Raid5 – Disc-packs – Disk is the archive media of choice• A surprising future? – Disks (and other useful things) become supercomputers. – Apps run “in the disk”. 20
  • 21. The (absurd?) consequences • 256 way nUMA? • Huge main memories: now: 500MB - 64GB memories then: 10GB - 1TB memories• 1 GB RAM chips• • Huge disks MAD at 50 Gbpsi now: 5-50 GB 3.5” disks• Drives shrink one quantum then: 50-500 GB disks• 10 GBps SANs are ubiquitous • Petabyte storage farms• 500 mips cpus for 10$ – (that you can’t back up or restore).• 5 bips cpus at high end • Disks >> tapes – “Small” disks: One platter one inch 10GB • SAN convergence 1 GBps point to point is easy 21
  • 22. The Absurd? Consequences• Further segregate processing from storage• Poor locality• Much useless data movement• Amdahl’s laws: bus: 10 B/ips io: 1 b/ips RAM Memory Disks Processors 10 TBps 100 GBps ~ 1 Tips ~ 1 TB 22 ~ 100TB
  • 23. Storage Latency: How Far Away is the Data? Andromeda10 9 Tape /Optical 2,000 Years Robot 10 6 Disk Pluto 2 Years Olympia 1.5 hr100 Memory 10 On Board Cache This Hotel 10 min 2 On Chip Cache This Room 1 Registers My Head 1 min 23
  • 24. Consequences• AutoManage Storage• Sixpacks (for arm-limited apps)• Raid5-> Raid10• Disk-to-disk backup• Smart disks 24
  • 25. Auto Manage Storage• 1980 rule of thumb: – A DataAdmin per 10GB, SysAdmin per mips• 2000 rule of thumb – A DataAdmin per 5TB – SysAdmin per 100 clones (varies with app).• Problem: – 5TB is 60k$ today, 10k$ in a few years. – Admin cost >> storage cost???• Challenge: – Automate ALL storage admin tasks 25
  • 26. The “Absurd” Disk• 2.5 hr scan time (poor sequential access)• 1 aps / 5 GB (VERY cold data)• It’s a tape! 100 MB/s 200 Kaps 1 TB 26
  • 27. Extreme case: 1TB disk: Alternatives• Use all the heads in parallel – Scan in 30 minutes 500 MB/s – Still one Kaps/5GB 1 TB 200 Kaps• Use one platter per arm 500 MB/s – Share power/sheetmetal 200GB – Scan in 30 minutes each – One KAPS per GB 1,000 Kaps 27
  • 28. Drives shrink (1.8”, 1”)• 150 kaps for 500 GB is VERY cold data• 3 GB/platter today, 30 GB/platter in 5years.• Most disks are ½ full• TPC benchmarks use 9GB drives (need arms or bandwidth).• One solution: smaller form factor – More arms per GB – More arms per rack – More arms per Watt 28
  • 29. Prediction: 6-packs• One way or another, when disks get huge – Will be packaged as multiple arms – Parallel heads gives bandwidth – Independent arms gives bandwidth & aps• Package shares power, package, interfaces… 29
  • 30. Stripes, Mirrors, Parity (RAID 0,1, 5) • RAID 0: Stripes 0,3,6,.. 1,4,7,.. 2,5,8,.. – bandwidth • RAID 1: Mirrors, Shadows,… – Fault tolerance – Reads faster, writes 2x slower 0,1,2,.. 0,1,2,.. • RAID 5: Parity – Fault tolerance – Reads faster 0,2,P2,.. 1,P1,4,.. P0,3,5,.. – Writes 4x or 6x slower. 30
  • 31. RAID 10 (strips of mirrors) Wins “wastes space, saves arms”RAID 5: RAID1• Performance • Performance – 225 reads/sec – 250 reads/sec – 70 writes/sec – 100 writes/sec – Write – Write • 4 logical IO, • 2 logical IO • 2 seek + 1.7 rotate • 2 seek 0.7 rotate• SAVES SPACE • SAVES ARMS• Performance • Performance degrades on failure improves on failure 31
  • 32. The Storage Rack Today• 140 arms• 4TB• 24 racks 24 storage processors 6+1 in rack• Disks = 2.5 GBps IO• Controllers = 1.2 GBps IO• Ports 500 MBps IO 32
  • 33. Storage Rack in 5 years?• 140 arms• 50TB• 24 racks 24 storage processors 6+1 in rack• Disks = 2.5 GBps IO• Controllers = 1.2 GBps IO• Ports 500 MBps IO• My suggestion: move the processors into the storage racks. 33
  • 34. It’s hard to archive a PetaByte It takes a LONG time to restore it.• Store it in two (or more) places online (on disk?).• Scrub it continuously (look for errors)• On failure, refresh lost copy from safe copy.• Can organize the two copies differently (e.g.: one by time, one by space) 34
  • 35. Crazy Disk Ideas• Disk Farm on a card: surface mount disks• Disk (magnetic store) on a chip: (micro machines in Silicon)• Full Apps (e.g. SAP, Exchange/Notes,..) in the disk controller ASIC (a processor with 128 MB dram) The Innovators Dilemma: When New Technologies Cause Great Firms to Fail Clayton M. Christensen .ISBN: 0875845851 35
  • 36. The Disk Farm On a Card• The 500GB disc card• An array of discs 14"• Can be used as• 100 discs• 1 striped disc• 50 Fault Tolerant discs• ....etc• LOTS of accesses/second bandwidth 36
  • 37. Functionally Specialized Cards P mips processor• Storage ASIC Today: P=50 mips M= 2 MB• Network M MB DRAM In a few years ASIC P= 200 mips M= 64 MB• Display ASIC 37
  • 38. It’s Already True of Printers Peripheral = CyberBrick• You buy a printer• You get a – several network interfaces – A Postscript engine • cpu, • memory, • software, • a spooler (soon) – and… a print engine. 38
  • 39. All Device Controllers will be Cray 1’s• TODAY – Disk controller is 10 mips risc engine with 2MB DRAM Central – NIC is similar power Processor &• SOON Memory – Will become 100 mips systems with 100 MB DRAM.• They are nodes in a federation (can run Oracle on NT in disk controller).• Advantages – Uniform programming model – Great tools Tera Byte – Security Backplane – Economics (cyberbricks) – Move computation to data (minimize traffic) 39
  • 40. With Tera Byte Interconnect and Super Computer Adapters• Processing is incidental to – Networking – Storage – UI• Disk Controller/NIC is – faster than device – close to device Tera Byte – Can borrow device Backplane package & power• So use idle capacity for computation.• Run app in device.• Both Kim Keeton (UCB) and Erik Riedel (CMU) thesis investigate this 40 show benefits of this approach.
  • 41. Implications Conventional Radical• Offload device handling to • Move app to NIC/HBA NIC/device controller• higher level protocols: • higher-higher level I2O, NASD, VIA, IP, TCP… protocols: CORBA / COM+.• SMP and Cluster parallelism is important. • Cluster parallelism is VERY important. Central Processor & Tera Byte Backplane Memory 41
  • 42. How Do They Talk to Each Other? • Each node has an OS • Each node has local resources: A federation. • Each node does not completely trust the others. • Nodes use RPC to talk to each other – CORBA? COM+? RMI? – One or all of the above.Applications Applications • Huge leverage in high-level interfaces. datagramsdatagrams streams streams • Same old distributed system story. RPC RPC ? ? SIO SIO SAN 42
  • 43. Outline• The Surprise-Free Future (5 years) – Astonishing hardware progress.• Some consequences – Absurd (?) consequences. – Auto-manage storage – Raid10 replaces Raid5 – Disc-packs – Disk is the archive media of choice• A surprising future? – Disks (and other useful things) become supercomputers. – Apps run “in the disk” 43

×