Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nvmw 2014 extending main memory with flash-the optimized swap approach

788 views

Published on

Title: Extending Main Memory with Flash-the Optimized SWAP Approach
Author: Jihyung Park, Hyuck Han, Sangyeun Cho
Memory Solutions Lab, Memory Business, Samsung Electronics

Published in: Technology, Business
  • Be the first to comment

Nvmw 2014 extending main memory with flash-the optimized swap approach

  1. 1. Jihyung Park, Hyuck Han and Sangyeun Cho Memory Solutions Lab Memory Business Extending Main Memory with Flash – the Optimized SWAP Approach
  2. 2. 1. Introduction 2. Optimized SWAP 3. Evaluation 4. Future Work 5. Conclusion
  3. 3. Why extend main memory with flash? • To overcome DRAM scaling limitations and offer large working memory • To reduce total cost of ownership (acquisition and operation) • Flash has no seek time • Flash has faster latency than HDD Two approaches toward memory extension • Non-transparent approach: Application has to change • Transparent approach: Application is NOT aware of the underlying flash Introduction
  4. 4. Current swap algorithm is optimized for HDD Paging for the Fast device • Fast and Simple vs. Heavy and Accurate Motivation
  5. 5. Swap entry search • A new search algorithm I/O path optimization • Swap read-ahead • I/O scheduler • Swappiness Swap device as backing store: Inclusive vs. Exclusive • We adjust the swap entry free policy to enforce that the swap device “includes” all swapped out pages Optimized SWAP
  6. 6. Tree search • “Bit tree”, no pointer, a node size is just one byte • Fan-out degree is 8 (one bit is pointing a child node) • 8-level tree covers multi-terabytes of swap space. • Search cost: 2O(log N) • Reduce swap structure size – Roughly current swap mechanism vs. O-Swap = 10MB vs. 2MB (to support 32GB swap space) Optimized SWAP 0 2 4 61 3 5 7 8 9
  7. 7. Read-ahead • No read-ahead (due to randomness) • Note also that SSD has no seek time I/O scheduler • NOOP (due to randomness and fast response requirements) • Bypass Swappiness • swappiness : 0 Swap entry reclaim policy • Do not free swap entries as much as possible Optimized SWAP
  8. 8. Evaluation - Memcached System CPU Xeon E5-2665 (HT disabled) # Core 16 Network 10Gb Ethernet SSD Samsung XS1715 (NVME) Workload YCSB DB Size 30GB Value Length 2048B # memcached threads 64 # Clients 320 Get : Update 95% : 5% Memory SWAP OSWAP Full DRAM DRAM 8GB SSD Swap 32GB DRAM 8GB SSD Swap 32GB DRAM 32GB
  9. 9. Evaluation - Memcached 0 2 4 6 8 10 12 14 SWAP OSWAP Full DRAM Operationspersecond(x10,000) Memcached (NVME, 10Gb Network)
  10. 10. Evaluation - Memcached 0 1 2 3 4 5 6 7 8 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms Operationspersecond(x1,000) SWAP Performance by Latency Segment < 1ms QoS
  11. 11. Evaluation - Memcached 0 5 10 15 20 25 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms Operationspersecond(x1,000) OSWAP Performance by Latency Segment < 1ms QoS
  12. 12. Evaluation - Memcached 0 2 4 6 8 10 12 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms Operationspersecond(x10,000) Full DRAM Performance by Latency Segment < 1ms QoS
  13. 13. Evaluation - Linkbench System CPU Xeon E5-2665 (HT disabled) # Core 16 Network 10Gb Ethernet SSD Samsung XS1715 (NVME) Workload Linkbench DB Size 30GB # Clients 400 Memory SWAP OSWAP Full DRAM DRAM 8GB SSD Swap 32GB DRAM 8GB SSD Swap 32GB DRAM 32GB
  14. 14. Evaluation - Linkbench 0 2 4 6 8 10 12 14 SWAP OSWAP Full DRAM Requestspersecond(x1,000) Linkbench
  15. 15. Rack scale architecture High performance memory + High capacity memory Future Work CPUs DRAM DRAM DRAM Compute PCIe <-> Ctrl Ctrl Memory Memory Memorycable Memory Device
  16. 16. Cost-effective memory capacity Exploit flash memory transparently Conclusion

×