Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

the FARM project - Systems Research Challenges Workshop, 16th-17th January

280 views

Published on

We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA communication to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can allocate, read, write, and free objects in the address space. They can use distributed transactions to simplify dealing with complex corner cases that do not significantly impact performance. FaRM provides good common-case performance with lock-free reads over RDMA and with support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook’s. They both perform well, for example, a 20-machine cluster can perform 160 million key-value lookups per second with a latency of 31µs.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

the FARM project - Systems Research Challenges Workshop, 16th-17th January

  1. 1. Hadoop / Cosmos SQL database
  2. 2. Shared address space O1 O2 O3 O4 O5 O6 O7 O8 O9 Transactions Replicated in memory Performance High throughput Low latency
  3. 3. CPU is the bottleneck
  4. 4. Use one-sided RDMA operations Reduce message counts Effectively use parallelism Design the system from first principles to use the hardware effectively
  5. 5. 2 GB 2 GB NIC CPUMemory Machine B 2 GB 2 GB
  6. 6. C P1 B1 P2 B2
  7. 7. C P1 B1 P2 B2
  8. 8. C P1 B1 P2 B2
  9. 9. 0 200 400 600 800 1000 0 30 60 90 120 150 Latencyµs Throughput ops/µs Latency 99% Latency 50%
  10. 10. 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 Throughputops/µs Time ms
  11. 11. 0 20 40 60 80 100 120 -100 0 100 200 300 400 500 600 Throughputops/µs Time ms
  12. 12. V Update W Read Consistent if versions match and object is not locked 64-bit to avoid overflow Read requires three network accesses Lock UpdateUnlock and incrementRead version Read data
  13. 13. $V $ $V V Lock UpdateUnlock and increment RDMA read, check that versions match Lock LockUpdateUpdateUnlock and increment Unlock and increment and that read does not take too long tupdate_min= 40 ns tread_max = 40 ns * 216 * (1 – ε) = 2 ms W W W
  14. 14. 0 1 2 3 4 5 6 7 0 50 100 150 200 250 300 Latencyms Tpmc (millions) Latency 99% Latency 50%

×