Nvmw 2014 extending main memory with flash-the optimized swap approach

710 views

Published on

Title: Extending Main Memory with Flash-the Optimized SWAP Approach
Author: Jihyung Park, Hyuck Han, Sangyeun Cho
Memory Solutions Lab, Memory Business, Samsung Electronics

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
710
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Nvmw 2014 extending main memory with flash-the optimized swap approach

  1. 1. Jihyung Park, Hyuck Han and Sangyeun Cho Memory Solutions Lab Memory Business Extending Main Memory with Flash – the Optimized SWAP Approach
  2. 2. 1. Introduction 2. Optimized SWAP 3. Evaluation 4. Future Work 5. Conclusion
  3. 3. Why extend main memory with flash? • To overcome DRAM scaling limitations and offer large working memory • To reduce total cost of ownership (acquisition and operation) • Flash has no seek time • Flash has faster latency than HDD Two approaches toward memory extension • Non-transparent approach: Application has to change • Transparent approach: Application is NOT aware of the underlying flash Introduction
  4. 4. Current swap algorithm is optimized for HDD Paging for the Fast device • Fast and Simple vs. Heavy and Accurate Motivation
  5. 5. Swap entry search • A new search algorithm I/O path optimization • Swap read-ahead • I/O scheduler • Swappiness Swap device as backing store: Inclusive vs. Exclusive • We adjust the swap entry free policy to enforce that the swap device “includes” all swapped out pages Optimized SWAP
  6. 6. Tree search • “Bit tree”, no pointer, a node size is just one byte • Fan-out degree is 8 (one bit is pointing a child node) • 8-level tree covers multi-terabytes of swap space. • Search cost: 2O(log N) • Reduce swap structure size – Roughly current swap mechanism vs. O-Swap = 10MB vs. 2MB (to support 32GB swap space) Optimized SWAP 0 2 4 61 3 5 7 8 9
  7. 7. Read-ahead • No read-ahead (due to randomness) • Note also that SSD has no seek time I/O scheduler • NOOP (due to randomness and fast response requirements) • Bypass Swappiness • swappiness : 0 Swap entry reclaim policy • Do not free swap entries as much as possible Optimized SWAP
  8. 8. Evaluation - Memcached System CPU Xeon E5-2665 (HT disabled) # Core 16 Network 10Gb Ethernet SSD Samsung XS1715 (NVME) Workload YCSB DB Size 30GB Value Length 2048B # memcached threads 64 # Clients 320 Get : Update 95% : 5% Memory SWAP OSWAP Full DRAM DRAM 8GB SSD Swap 32GB DRAM 8GB SSD Swap 32GB DRAM 32GB
  9. 9. Evaluation - Memcached 0 2 4 6 8 10 12 14 SWAP OSWAP Full DRAM Operationspersecond(x10,000) Memcached (NVME, 10Gb Network)
  10. 10. Evaluation - Memcached 0 1 2 3 4 5 6 7 8 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms Operationspersecond(x1,000) SWAP Performance by Latency Segment < 1ms QoS
  11. 11. Evaluation - Memcached 0 5 10 15 20 25 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms Operationspersecond(x1,000) OSWAP Performance by Latency Segment < 1ms QoS
  12. 12. Evaluation - Memcached 0 2 4 6 8 10 12 256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms Operationspersecond(x10,000) Full DRAM Performance by Latency Segment < 1ms QoS
  13. 13. Evaluation - Linkbench System CPU Xeon E5-2665 (HT disabled) # Core 16 Network 10Gb Ethernet SSD Samsung XS1715 (NVME) Workload Linkbench DB Size 30GB # Clients 400 Memory SWAP OSWAP Full DRAM DRAM 8GB SSD Swap 32GB DRAM 8GB SSD Swap 32GB DRAM 32GB
  14. 14. Evaluation - Linkbench 0 2 4 6 8 10 12 14 SWAP OSWAP Full DRAM Requestspersecond(x1,000) Linkbench
  15. 15. Rack scale architecture High performance memory + High capacity memory Future Work CPUs DRAM DRAM DRAM Compute PCIe <-> Ctrl Ctrl Memory Memory Memorycable Memory Device
  16. 16. Cost-effective memory capacity Exploit flash memory transparently Conclusion

×