Redis Operation with Persistent Memory - Jakub Schmiegel, Intel

37 views

Published on

Redis modifications to use Persistent Memory with NVML
shall be covered. Performance comparison of original Redis using different persistency modes with our implementation utilizing libpmemobj API shall be presented. We have focused on add, get, update scenarios as well as the overall system
performance in different resource conditions. We shall cover limitations of Copy-on-Write functionality with usage of Persistent Memory, and possible solutions to it

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
37
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • App organize data between 2 tiers: memory and storage. PM – introduce third tier – accessed like volatile memory, using load/store, but retains its contents across resets.
    NVM refers to category of SSS devices, Flash Memory in SSD, battery backed-up memories.
    From this group extract devices byte addressable and fast enough, so it is reasonable to stall CPU load instruction, waiting for a load directly from PM.
  • 3DXPoint – announced July 2015. Expected to be much faster and much more durable than traditional SSD and more denser than RAM.
    Announced to be present in 2 forms: SSD drive and NVDIMM factor.
    NVDIMM – 3 TB per 1 socket
  • Here you can see NVDIMM Software Architecture utilized by sample Linux SW stack.
    DIMMs are described by BIOS to OS via NFIT (NVDIMM Firmware Interface Table). It is defined in ACPI – open standard that OS can use to perform discovery and configuration of hardware.
    Beginning from ACPI 6.0 – includes NFIT. NFIT provides info that allows to enumerate NVDIMMs, associate System Physical Addresses ranges created by NVDIMMs.
    NVDIMM Driver – Interfaces OS, File System to NVDIMM. Allows to manage all NVDIMMS. Included in kernel from 4.0.
    On left – it’s function is to provide the interfaces to Manageability SW.
    Block support. Allows block access using block file system and standard file API or direct with Standard Raw Device Access.

    PersistentMode – driver exposes memory to OS as a device. On device we can create FS. On Linux 4.0 there is DAX.
    The DAX code removes the extra copy to the page cache by performing reads and writes directly to the storage device.
    On disk we can create file and mmap it in application. Once App has access –may use memory directly, performs loads/stores without driver.
  • Mmap gives very raw access. It returns only address. We can use sbrk() system call to create malloc, free.
    Program allocates range of memory but dies before linking it – PM leak. It is not problem with volatile memory, but on PM we need pm-aware version of malloc.
    This is a place for NVML. It helps devs to manage some common problems like memory allocations and transactional updates.
    It is user space library. It operates outside of kernel so it is faster and more efficient.
  • It helps also with other issue.
    Need of msync – to flush data from cache lines and PM device caches.
    CLFLUSHOPT – push out of processor cache.
    PCOMMIT – commit all changes to PM.
    They are included in NVML code.
  • Consist of 6 libs, including libpmemobj.
    Operates on memory pool created with pmemobj_create(). Reference is passed to other functions.
    We can use it with atomic allocations or we can take advantage of transactional API, which will execute series of operations in transaction.
    Objects are represented by PMEMoids, consisting of Pool ID and offset inside pool.
    We can assign type number to object during allocation and then iterate through all objects of given type.
  • Why Redis – most popular key-value store. Modification is relatively easy.
    Separated user data in PM and internal data in RAM (like dictionary hashtable which can be rebuilt at startup).
    PM mode is enabled via in config file, then AOF and RDB are disabled.
    It is hard to be faster than Redis on RAM, but we can compete with Persistent Modes.
    Tested on 2 socket – 16 cores machine with 256 GB DDR3, part exposed as RAM, part as PM with NVDIMM driver.
    For RDB and AOF mode storage was 800GB Intel SSD attached via NVMe.
  • As results we see nr of oper/sec.
    Tests were done on different objects size. 500 000 objects were used.
    ADD,UPDATE: PM is 50-60% slower than RDB, up to 2x times faster than AOF.
    RDB snapshots are triggered every second. AOF is triggering fsync everycommand.
    PM is providing consistency on command level, which is much better than RDB and similar to AOF.
    PM is slower than RAM because it uses NVML for keeping data in consistent way, managing persistency.
    There are more features than nr of oper/sec and which gives new possibilities.
  • PM is always the fastest, almost independent from DB size. Around 1 sec. RDB, AOF are several times slower.
    PM not zero because there is dictionary hashtable rebuild at startup. Could be zero if Redis was modified.
  • Diagram for dataset size around 5GB.
    DRAM usage reduced in PM (10MB), while AOF uses up 7GB. This scenario shows that we need less DRAM in system.
  • No persist – drop when size of dataset is 90% of available DRAM.
  • RDB drops when size is around 50%.
    The same for AOF.
  • PM performance is not related to installed RAM size and still stable when size of dataset is above 150%
  • We know results for one instance. PM 2 times faster.
  • 10 times faster
  • Run on dual threaded 8 physical cores CPU
    Still over 15x faster
    Difference comes from the fact that AOF uses system calls, which triggers File System and IO operations.
    With PM we don’t use FS and kernel.
  • There are some constraints in Persistent Programming Model.
    Process executes fork. Parent process pages are marked as “readOnly”. When there is modification in some page, it is duplicated by copyOnWrite mechanism.
    There is no second pool so there is nowhere to duplicate.
    Fork is not needed as RDB and AOF modes are not necessary, but there are still some features like replication which use it.
  • I invite you to visit and contribute to our Repository
  • Redis Operation with Persistent Memory - Jakub Schmiegel, Intel

    1. 1. Jakub Schmiegel
    2. 2. Storage media 2 Non-Volatile Memory Persistent Memory
    3. 3. 3
    4. 4. 4 Persistent Memory Programming Model
    5. 5. 5 Persistent Memory User Space Kernel Space Application Load/ Store MMU Mapp ings NVDIMM Driver temporary file Standard File API PM-aware File System NVML NVML
    6. 6. Why persistency is an issue? 6 CPU L1 L2 L3 iMC NVDIMM CPU L1 L2 iMC NVDIMM CLFLUSH CLFLUSHOPT CLWB SFENCE PCOMMIT SFENCE
    7. 7. NVML / libpmemobj 7 Objects Transactions http://pmem.io/nvml/ https://github.com/pmem/nvml/
    8. 8. Redis Adaptation 8 Modifications Benchmarking
    9. 9. 9 0 50000 100000 150000 200000 250000 32 64 128 256 512 1024 2048 4096 8192 operations/sec object size SET operation No Persist RDB PM AOF
    10. 10. 10 0 5 10 15 20 25 30 32 64 128 256 512 1024 2048 4096 8192 seconds object size Startup time RDB AOF PM
    11. 11. 11 0 1000 2000 3000 4000 5000 6000 7000 8000 32 64 128 256 512 1024 2048 4096 8192 DRAMallocations[MB] objects size DRAM usage AOF RDB PM
    12. 12. 12 0 20000 40000 60000 80000 100000 120000 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 operations/sec % of OS memory Running out of DRAM No Persist
    13. 13. 13 0 20000 40000 60000 80000 100000 120000 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 operations/sec % of OS memory Running out of DRAM No Persist RDB AOF
    14. 14. 14 0 20000 40000 60000 80000 100000 120000 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 operations/sec % of OS memory Running out of DRAM No Persist RDB PM AOF
    15. 15. 15 0 10000 20000 30000 40000 50000 60000 70000 80000 32 64 128 256 512 1024 2048 4096 8192 operations/sec objects size CPU usage PM 1X AOF 1X
    16. 16. 16 0 10000 20000 30000 40000 50000 60000 70000 80000 32 64 128 256 512 1024 2048 4096 8192 operations/sec objects size CPU usage PM 1X PM 4X AOF 1X AOF 4X
    17. 17. 17 0 10000 20000 30000 40000 50000 60000 70000 80000 32 64 128 256 512 1024 2048 4096 8192 operations/sec objects size CPU usage PM 1X PM 4X PM 6X PM 10X AOF 1X AOF 4X AOF 10X
    18. 18. Is Redis Ready? 18 fork()
    19. 19. 19 https://github.com/pmem/redis

    ×