RAMCloud: Scalable
Datacenter Storage
 Entirely in DRAM

     John Ousterhout
    Stanford University
Introduction

● New research project at Stanford

● Create large-scale storage systems entirely in DRAM

● Interesting com...
RAMCloud Overview
● Storage for datacenters
                                          Application Servers
● 10-10000 commo...
Example Configurations


                                          Today        5-10 years
                   # servers   ...
RAMCloud Motivation
● Relational databases don’t scale

● Every large-scale Web application has problems:
        Faceboo...
RAMCloud Motivation, cont’d
Disk access rate not keeping up with capacity:

                             Mid-1980’s       ...
Impact of Latency
   Traditional Application                                          Web Application




                ...
Research Issues

● Achieving 5-10 µs RPC

● Durability at low latency

● Data model

● Concurrency/consistency model

● Da...
Low Latency: SQL is Dead?
● Relational query model tied to high latency:
        Describe what you need up fron
        ...
Low Latency: Stronger Consistency?

● Cost of consistency rises with transaction overlap:
                   O ~ R*D
     ...
Low Latency: One Size Fits All Again?

● "One-size-fits-all is dead" - Mike Stonebraker

● Specialized databases prolifera...
Conclusions
● All online data is moving to RAM
● RAMClouds = the future of datacenter storage
● Low latency will change ev...
Questions/Comments?



For more on RAMCloud motivation & research issues:
        “The Case for RAMClouds: Scalable High-...
Backup Slides




October 27, 2009       RAMCloud/HPTS   Slide 14
Why Not a Caching Approach?
● Lost performance:
        1% misses → 10x performance degradation

● Won’t save much money:...
Why not Flash Memory?

● Many candidate technologies besides DRAM
        Flash (NAND, NOR)
        PC RAM
        …

●...
Is RAMCloud Capacity Sufficient?
● Facebook: 200 TB of (non-image) data today
● Amazon:
       Revenues/year:            $...
Data Durability/Availability
● Data must be durable when write RPC returns
● Unattractive possibilities:
        Synchron...
Durability/Availability, cont’d
● Buffered logging supports ~50K writes/sec./server
     (vs. 1M reads)
● Need fast recove...
Low-Latency RPCs
Achieving 5-10µs will impact every layer of the system:
● Must reduce network latency:
        Typical t...
Low-Latency RPCs, cont’d
● Client side: need efficient path through VM
        User-level access to network interface?

●...
Interesting Facets
● Use each system property to improve the others
● High server throughput:
        No replication for ...
New Conference!
USENIX Conference on Web Application Development:

● All topics related to developing and deploying Web
  ...
Upcoming SlideShare
Loading in …5
×

RAMCloud: Scalable Datacenter Storage Entirely in DRAM

2,119 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,119
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
72
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

RAMCloud: Scalable Datacenter Storage Entirely in DRAM

  1. 1. RAMCloud: Scalable Datacenter Storage Entirely in DRAM John Ousterhout Stanford University
  2. 2. Introduction ● New research project at Stanford ● Create large-scale storage systems entirely in DRAM ● Interesting combination: scale, low latency ● The future of datacenter storage ● Low latency disruptive to database community October 27, 2009 RAMCloud/HPTS Slide 2
  3. 3. RAMCloud Overview ● Storage for datacenters Application Servers ● 10-10000 commodity servers ● ~64 GB DRAM/server ● All data always in RAM ● Durable and available ● High throughput: 1M ops/sec/server ● Low-latency access: Storage Servers 5-10µs RPC Datacenter October 27, 2009 RAMCloud/HPTS Slide 3
  4. 4. Example Configurations Today 5-10 years # servers 1000 1000 GB/server 64GB 1024GB Total capacity 64TB 1PB Total server cost $4M $4M $/GB $60 $4 October 27, 2009 RAMCloud/HPTS Slide 4
  5. 5. RAMCloud Motivation ● Relational databases don’t scale ● Every large-scale Web application has problems:  Facebook: 4000 MySQL servers + 2000 memcached servers ● New forms of storage starting to appear:  Bigtable  Dynamo  PNUTS  H-store  memcached October 27, 2009 RAMCloud/HPTS Slide 5
  6. 6. RAMCloud Motivation, cont’d Disk access rate not keeping up with capacity: Mid-1980’s 2009 Change Disk capacity 30 MB 500 GB 16667x Max. transfer rate 2 MB/s 100 MB/s 50x Latency (seek & rotate) 20 ms 10 ms 2x Capacity/bandwidth 15 s 5000 s 333x (large blocks) Capacity/bandwidth 600 s 58 days 8333x (1KB blocks) Jim Gray's rule 5 min 30 hrs 360x ● Disks must become more archival ● More information must move to memory October 27, 2009 RAMCloud/HPTS Slide 6
  7. 7. Impact of Latency Traditional Application Web Application Application Servers Storage Servers UI UI App. Bus. Logic Logic Data Structures Single machine Datacenter << 1µs latency 0.5-10ms latency ● Large-scale apps struggle with high latency ● RAMCloud goal: low latency and large scale ● Enable a new breed of information-intensive applications October 27, 2009 RAMCloud/HPTS Slide 7
  8. 8. Research Issues ● Achieving 5-10 µs RPC ● Durability at low latency ● Data model ● Concurrency/consistency model ● Data distribution, scaling ● Automated management ● Multi-tenancy ● Node architecture October 27, 2009 RAMCloud/HPTS Slide 8
  9. 9. Low Latency: SQL is Dead? ● Relational query model tied to high latency:  Describe what you need up fron  DBMS optimizes retrieval ● With sufficiently low latency:  Don't need optimization; make individual requests as needed  Can't afford query processing overhead  The relational query model will disappear ● Question: what systems offer very low latency and use relational model? October 27, 2009 RAMCloud/HPTS Slide 9
  10. 10. Low Latency: Stronger Consistency? ● Cost of consistency rises with transaction overlap: O ~ R*D O = # overlapping transactions R = arrival rate of new transactions D = duration of each transaction ● R increases with system scale  Eventually, scale makes consistency unaffordable ● But, D decreases with lower latency  Stronger consistency affordable at larger scale  Is this phenomenon strong enough to matter? October 27, 2009 RAMCloud/HPTS Slide 10
  11. 11. Low Latency: One Size Fits All Again? ● "One-size-fits-all is dead" - Mike Stonebraker ● Specialized databases proliferating:  50x performance improvements in specialized domains  Optimize disk layout to eliminate seeks ● With low latency:  Layout doesn't matter  General-purpose is fast  One-size-fits-all rides again? October 27, 2009 RAMCloud/HPTS Slide 11
  12. 12. Conclusions ● All online data is moving to RAM ● RAMClouds = the future of datacenter storage ● Low latency will change everything:  New applications  Stronger consistency at scale  One-size-fits-all again  SQL is dead ● 1000-10000 clients accessing 100TB - 1PB @ 5-10µs latency October 27, 2009 RAMCloud/HPTS Slide 12
  13. 13. Questions/Comments? For more on RAMCloud motivation & research issues:  “The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM”  To appear in Operating Systems Review  http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf  Or, google “RAMCloud” October 27, 2009 RAMCloud/HPTS Slide 13
  14. 14. Backup Slides October 27, 2009 RAMCloud/HPTS Slide 14
  15. 15. Why Not a Caching Approach? ● Lost performance:  1% misses → 10x performance degradation ● Won’t save much money:  Already have to keep information in memory  Example: Facebook caches ~75% of data size ● Changes disk management issues:  Optimize for reads, vs. writes & recovery October 27, 2009 RAMCloud/HPTS Slide 15
  16. 16. Why not Flash Memory? ● Many candidate technologies besides DRAM  Flash (NAND, NOR)  PC RAM  … ● DRAM enables lowest latency:  5-10x faster than flash ● Most RAMCloud techniques will apply to other technologies ● Ultimately, choose storage technology based on cost, performance, energy, not volatility October 27, 2009 RAMCloud/HPTS Slide 16
  17. 17. Is RAMCloud Capacity Sufficient? ● Facebook: 200 TB of (non-image) data today ● Amazon: Revenues/year: $16B Orders/year: 400M? ($40/order?) Bytes/order: 1000-10000? Order data/year: 0.4-4.0 TB? RAMCloud cost: $24K-240K? ● United Airlines: Total flights/day: 4000? (30,000 for all airlines in U.S.) Passenger flights/year: 200M? Bytes/passenger-flight: 1000-10000? Order data/year: 0.2-2.0 TB? RAMCloud cost: $13K-130K? ● Ready today for all online data; media soon October 27, 2009 RAMCloud/HPTS Slide 17
  18. 18. Data Durability/Availability ● Data must be durable when write RPC returns ● Unattractive possibilities:  Synchronous disk write (100-1000x too slow)  Replicate in other memories (too expensive) ● One possibility: log to RAM, then disk write log log DRAM DRAM DRAM Storage async, batch Servers disk disk disk October 27, 2009 RAMCloud/HPTS Slide 18
  19. 19. Durability/Availability, cont’d ● Buffered logging supports ~50K writes/sec./server (vs. 1M reads) ● Need fast recovery after crashes:  Read 64 GB from disk? 10 minutes  Shard backup data across 100’s of servers  Reduce recovery time to 1-2 seconds ● Other issues:  Power failures  Cross-datacenter replication October 27, 2009 RAMCloud/HPTS Slide 19
  20. 20. Low-Latency RPCs Achieving 5-10µs will impact every layer of the system: ● Must reduce network latency:  Typical today: 10-30 µs/switch, Switch 5 switches each way Switch Switch  Arista: 0.9 µs/switch: 9 µs roundtrip Switch Switch  Need cut-through routing, Client Server congestion mgmt ● Tailor OS on server side:  Dedicated cores  No interrupts?  No virtual memory? October 27, 2009 RAMCloud/HPTS Slide 20
  21. 21. Low-Latency RPCs, cont’d ● Client side: need efficient path through VM  User-level access to network interface? ● Network protocol stack  TCP too slow (especially with packet loss)  Must avoid copies ● Preliminary experiments:  10-15 µs roundtrip  Direct connection: no switches October 27, 2009 RAMCloud/HPTS Slide 21
  22. 22. Interesting Facets ● Use each system property to improve the others ● High server throughput:  No replication for performance, only durability?  Simplifies consistency issues ● Low-latency RPC:  Cheap to reflect writes to backup servers  Stronger consistency? ● 1000’s of servers:  Sharded backups for fast recovery October 27, 2009 RAMCloud/HPTS Slide 22
  23. 23. New Conference! USENIX Conference on Web Application Development: ● All topics related to developing and deploying Web applications ● First conference: June 20-25, 2010, Boston ● Paper submission deadline: January 11, 2010 ● http://www.usenix.org/events/webapps10/ October 27, 2009 RAMCloud/HPTS Slide 23

×