Breaking the Sound Barrier with Persistent Memory

268 views

Published on

Liqi Yi and Shylaja Kokoori (Intel)

A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
268
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Breaking the Sound Barrier with Persistent Memory

  1. 1. Breaking the Sound Barrier with Persistent Memory Liqi Yi Shylaja Kokoori
  2. 2. Legal Disclaimer 2 Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2016 Intel Corporation. All rights reserved.
  3. 3. Motivation  Disk writes not uniform  Disk writes happen in burst fashion, and high write bandwidth is required while flushing & compacting  Read/write bandwidth inflation  Each Key/Value (KV) pair will be written to disk and read back many times due to flush, compact, and read caching. This inflation is very painful when handling large query rate on a small memory system.  Change in data format (serialization/deserialization) between memory and data store (for example, disk)  Adds latency to the read/write path, and wastes a lot of CPU cycles. 3
  4. 4. What do we need to by pass these issues • Persistent store with much higher bandwidth • Larger cache for data on disk • Less number of round trips for KVs between memory and persistent store • What if we do not need to change the format when sending and bringing data between memory and data store? • Of course, lower latency always helps !! 4
  5. 5. Do we have something that fulfills these requirements?  PCI-E SSD (NVM)  Faster than SATA SSD, but much slower than memory(both latency and bandwidth), still could be bottle necked on heavy load, still needs to do data format changing  Huge DRAM  Ideal case, solves everything, but way to expensive, and subject to data loss  What if we can put persistency and memory together? 5
  6. 6. Do we have something that fulfills these requirements?  PCI-E SSD (NVM)  Faster than SATA SSD, but much slower than memory(both latency and bandwidth), //still could be bottle necked on heavy load, still needs to do data format changing) //make it a table  Huge DRAM  Ideal case, solves everything, but way to expensive, and subject to data loss  What if we can put persistency and memory together? 6 The solution: Persistent Memory
  7. 7. Experiment Setup • Persistent memory emulation environment was used to emulate the latencies of persistent memory. This environment is capable of performing at varied latencies. • Used Yahoo Cloud Serving Benchmark(YCSB) to drive the HBase cluster • Number of query/transaction per second used to measure throughput • Round trip time for the query was used to measure latency • Database was preloaded and experiment involved pure read • In baseline configuration, if data is not available in DRAM it is read from SSDs 7
  8. 8. Experiment Design Experiment was designed around following scenarios • Increase Bucket Cache on persistent memory at regular percentage increment (10%) and observe the effect on throughput and response time • Restrict the input transaction count and observe the effect on throughput and response time for baseline and 100% bucket cache on persistent memory • Change persistent memory latency and observe its impact on response time 8
  9. 9. Approximately 5x increase in throughput when all the bucket cache is configured in persistent memory 6.8 7.2 7.6 8.4 9.7 11.2 13.4 16.5 21.2 28.6 40.4 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Kops/sec Bucket cache % configured in persistent memory Change in throughput as percentage of bucket cache configured in persistent memory 5x increase
  10. 10. Change in query response time as percentage of bucket cache moved to persistent memory 29.4 27.7 26.1 23.9 20.4 17.8 14.8 12.1 9.4 7.0 4.9 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% ms Bucket cache % configured in persistent memory Change in average query response time as percentage of bucket cache configured in persistent memory Approximately 6x reduction in response time when all the bucket cache is configured in persistent memory 6x
  11. 11. Persistent Memory Latency Impact Latency change between 115ns and 500ns increases YCSB’s client response time by 1% 11 Persistent memory read latency (ns) 115 200 300 400 500 600 YCSB average response time (ms) 39.0 39.0 40.5 39.6 38.3 37.9 Increased memory latency impact on YCSB response time (%) 0% 0% 1% 1% 1%
  12. 12. Current software support 12 Graph from http://www.snia.org/sites/default/files/NVM/2016/presentations/RickCoulson_All_the_Ways_3D_XPoint_Impacts.pdf • Open Source: http://pmem.io • libvmem, libvmmalloc • libpmem, libpmemobj, libpmemblk, libpmemlog
  13. 13. Summary • Persistent memory is faster, larger, with byte addressable capability. • HBase will benefit from persistent memory in current architecture and possibly new architectures in the future. • Software support is on track. 13

×