Persistent memory

Benoit Hudzia
Benoit HudziaChief AIOPS - Big Data Analytics Platform
Persistent Memory
Dr. Benoit Hudzia
@blopeur
benoit@stratoscale.com
Agenda
NVM Evolution
Persistent Memory Linux Software Stack
Using , Emulating PMEM on Linux
Remote PMEM
Micro Storage Architecture
NVM Evolution
Persistent Memory
Yesterday : Battery Backed RAM
Today : NVDIMM with RAM + FLASH
Power Down - copy to Flash, Power Up copy Back to RAM
Emerging NVDIMM : PCM - 3DX Point - Memristor - etc…
Offer 1000x speed vs NAND -> closer to RAM
Characteristics as seen by software : Synchronous Model
Load / Store memory instruction
New Generation HW NVM is no longer the bottleneck
But still limited by Block stack latency + Asynchronous
Model
Asynchronous Model : NVMe
“When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf
● Active Polling ( SYNC ) lower latency ( at the expense of
CPU) vs interrupt MSI-X (ASYNC)
● Used in Intel SPDK
Enter persistent Memory
Source: Intel
4KB
Read
64B
Read
Moving away from Block I/O
L
A
T
E
N
C
Y
A
C
C
E
S
S
Lead to a new Tiered Software Stack
Challenge: Durability
PMEM Linux Software Stack
Linux kernel (>4.2) subsystem
NVDIMM Software Architecture
http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
BTT vs DAX
BTT : Block translation table
provides atomic sector update semantics for persistent memory devices
applications that rely on sector writes not being torn can continue to do so.
For Legacy application
DAX : stands for Direct Access
Allows mapping a pmem range directly into userspace via mmap
If the application is aware of persistent, byte-addressable memory, and can use it
to an advantage, DAX is the best path for it
Using , Emulating PMEM on Linux
Kernel Config ( > 4.2 )
Enable NVDIMM dynamic debug before you start playing with NVDIMMs
Add to the kernel cmd line:
libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg
ignore_loglevel
Pick your PMEM
Use ACPI 6.0 compatible NVDIMM hardware or
legacy NVDIMMs
Use virtual NVDIMMs provided by hypervisor
RAM as persistent memory
PCMSIM: NVM-disk Emulation
Emulation : RAM as PMEM
Bare metal :
Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory,
starting at 16G.
cat /proc/cmdline :
BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181
resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G
BTT works
QEMU NVDIMM
Qemu :
qemu-system-x86_64 -object memory-backend-file,share,id=mem1,mem-
path=/dax/D1 -device nvdimm,memdev=mem1,reserve-label-data,id=nv1 -m
2048,maxmem=100G,slots=10 ….
Not yet in Upstream Qemu :
https://github.com/xiaogr/qemu/tree/nvdimm-v9
Seabios integration :
http://www.seabios.org/pipermail/seabios/2015-September/009770.html
Playing with DAX
Only ext2, ext4 and xfs currently support DAX
Note that block size should match page size
mkfs.ext4 -b 4096 /dev/pmem1
mount -t ext4 -o dax /dev/pmem1 /tmp/dax/
Playing with DAX - Cont
Then you just have to mmap it!
But remember: CFLUSH, etc.. for durability
NVML : Lets somebody else do the heavy lifting
http://pmem.io/
libpmem – Basic persistency handling
Libvmmalloc - Transparently converts all the dynamic memory allocations into
persistent memory allocations.
libpmemblk – Block access to pmem
libpmemlog - Log file on pmem (append-mostly)
libpmemobj - Transactional Object Store on pmem
Many more… pynvm , C++ bidings , etc..
Remote PMEM
Remote NVMe : using RDMA to transfer NVMe commands & data
http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/
Transitioning from Indirect to Direct Flow
● Project Donard ( PMC - Microsemi)
● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)
Comes with Challenge : Durability vs Visibility
http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf
RDMA + DDIO
RDMA + Non Allocating write
Peer 2 Peer : Bypassing CPU + SW bottleneck
● NVM HW - Expose BAR
address
● March 16 : RFC patchset for
DAX allowing DMA to I/O
mem
● CCIX fabric
● Use case:
○ Pre-process in Data
path
○ Avoid RAM buffer (
HMM style )
○ SW only fetch what is
necessary
Future Hyperscale Architecture
NVMe gravy train for 3-5 years
Transition to Pmem optimised apps and
Natural evolution of Ethernet Connected
Drive => Fabric connected Pmem
Durable Array of Wimpy Nodes
Direct PMEM
Low power High perf K/V storage
Use pluggable front end
Links
Drivers specs: http://pmem.io/documents/
NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt
Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9
Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html
Libraries:
https://github.com/pmem/nvml/
https://github.com/perone/pynvm
http://opennvm.github.io/index.html
https://github.com/spdk/spdk
Project :
PMFS : https://github.com/linux-pmfs/pmfs
NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVA
PCMSIM : https://code.google.com/p/pcmsim/
Patch :
Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard
adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target : http://www.spinics.net/lists/linux-
mm/msg103990.html
Thank You!
Questions ?
NVDIMM block I/O path
1 of 33

Recommended

Cloud datacenter network architecture (2014) by
Cloud datacenter network architecture (2014)Cloud datacenter network architecture (2014)
Cloud datacenter network architecture (2014)Gasida Seo
290 views23 slides
Using Kafka to scale database replication by
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
733 views46 slides
Shared Memory Centric Computing with CXL & OMI by
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIAllan Cantle
896 views35 slides
Bringing NetApp Data ONTAP & Apache CloudStack Together by
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherDavid La Motta
2.1K views37 slides
Linux Linux Traffic Control by
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic ControlSUSE Labs Taipei
3.5K views19 slides
Linux Performance Analysis and Tools by
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
531.1K views115 slides

More Related Content

What's hot

Revisit DCA, PCIe TPH and DDIO by
Revisit DCA, PCIe TPH and DDIORevisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIOHisaki Ohara
8.6K views11 slides
AMD Chiplet Architecture for High-Performance Server and Desktop Products by
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
4.6K views27 slides
Amandaを利用した仮想マシンバックアップ by
Amandaを利用した仮想マシンバックアップAmandaを利用した仮想マシンバックアップ
Amandaを利用した仮想マシンバックアップVirtualTech Japan Inc.
7.7K views67 slides
How to Survive an OpenStack Cloud Meltdown with Ceph by
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephSean Cohen
1.4K views46 slides
Blazing Performance with Flame Graphs by
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
323.6K views170 slides
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.) by
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)Yasunori Goto
2.1K views57 slides

What's hot(20)

Revisit DCA, PCIe TPH and DDIO by Hisaki Ohara
Revisit DCA, PCIe TPH and DDIORevisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIO
Hisaki Ohara8.6K views
AMD Chiplet Architecture for High-Performance Server and Desktop Products by AMD
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD4.6K views
How to Survive an OpenStack Cloud Meltdown with Ceph by Sean Cohen
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
Sean Cohen1.4K views
Blazing Performance with Flame Graphs by Brendan Gregg
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
Brendan Gregg323.6K views
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.) by Yasunori Goto
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)
不揮発メモリ(NVDIMM)とLinuxの対応動向について(for comsys 2019 ver.)
Yasunori Goto2.1K views
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化 by kazuhcurry
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
kazuhcurry11K views
High performance computing - building blocks, production & perspective by Jason Shih
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
Jason Shih16.8K views
Cassandra at eBay - Cassandra Summit 2012 by Jay Patel
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel85.8K views
Disaggregating Ceph using NVMeoF by ShapeBlue
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue1.5K views
NOSQL Database: Apache Cassandra by Folio3 Software
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software2.2K views
Scylla Summit 2022: Scylla 5.0 New Features, Part 2 by ScyllaDB
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
ScyllaDB562 views
Deep Dive into the New Features of Apache Spark 3.0 by Databricks
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks2.5K views
High Bandwidth Memory : Notes by Subhajit Sahu
High Bandwidth Memory : NotesHigh Bandwidth Memory : Notes
High Bandwidth Memory : Notes
Subhajit Sahu235 views
The Linux Block Layer - Built for Fast Storage by Kernel TLV
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV4.3K views
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures by AMD
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
AMD9.6K views
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,... by The Linux Foundation
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
FDW-based Sharding Update and Future by Masahiko Sawada
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
Masahiko Sawada2.4K views
Rhozet™ Carbon Coder/Server/Admin v3.11 User Guide by Videoguy
Rhozet™ Carbon Coder/Server/Admin v3.11 User GuideRhozet™ Carbon Coder/Server/Admin v3.11 User Guide
Rhozet™ Carbon Coder/Server/Admin v3.11 User Guide
Videoguy12.1K views

Viewers also liked

Lecture 7 by
Lecture 7Lecture 7
Lecture 7Anshumali Singh
4K views29 slides
GPUrdma - Presentation by
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - PresentationFeras Daoud
473 views56 slides
HERD-Hanjun by
HERD-HanjunHERD-Hanjun
HERD-HanjunHanjun Xiao
601 views16 slides
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum by
Paper on RDMA enabled Cluster FileSystem at Intel Developer ForumPaper on RDMA enabled Cluster FileSystem at Intel Developer Forum
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forumsomenathb
1.9K views35 slides
slides by
slidesslides
slideswebhostingguy
465 views17 slides
SOUG_GV_Flashgrid_V4 by
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
812 views39 slides

Viewers also liked(20)

GPUrdma - Presentation by Feras Daoud
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
Feras Daoud473 views
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum by somenathb
Paper on RDMA enabled Cluster FileSystem at Intel Developer ForumPaper on RDMA enabled Cluster FileSystem at Intel Developer Forum
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum
somenathb1.9K views
SOUG_GV_Flashgrid_V4 by UniFabric
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
UniFabric812 views
Approaching hyperconvergedopenstack by Ikuo Kumagai
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
Ikuo Kumagai1.7K views
Essential API Facade Patterns: Session Management (Episode 2) by Apigee | Google Cloud
Essential API Facade Patterns: Session Management (Episode 2)Essential API Facade Patterns: Session Management (Episode 2)
Essential API Facade Patterns: Session Management (Episode 2)
NVDIMM block drivers with NFIT by joeylikernel
NVDIMM block drivers with NFITNVDIMM block drivers with NFIT
NVDIMM block drivers with NFIT
joeylikernel3.7K views
San disk axel rosenberg by BigDataExpo
San disk axel rosenbergSan disk axel rosenberg
San disk axel rosenberg
BigDataExpo343 views
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안 by NAIM Networks, Inc.
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
NAIM Networks, Inc.1.1K views
Function Level Analysis of Linux NVMe Driver by 인구 강
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver
인구 강1.6K views
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S... by Windows Developer
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Windows Developer773 views
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison by DataStax Academy
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage ComparisonIntel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
DataStax Academy5.2K views

Similar to Persistent memory

IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ... by
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...In-Memory Computing Summit
389 views22 slides
Towards Software Defined Persistent Memory by
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent MemorySwaminathan Sundararaman
1.2K views21 slides
Ac922 cdac webinar by
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinarGanesan Narayanasamy
322 views37 slides
Optimizing Servers for High-Throughput and Low-Latency at Dropbox by
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
712 views29 slides
Current and Future of Non-Volatile Memory on Linux by
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linuxmountpoint.io
1K views35 slides
SOUG_SDM_OracleDB_V3 by
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3UniFabric
1.1K views37 slides

Similar to Persistent memory(20)

IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ... by In-Memory Computing Summit
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
Optimizing Servers for High-Throughput and Low-Latency at Dropbox by ScyllaDB
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB712 views
Current and Future of Non-Volatile Memory on Linux by mountpoint.io
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
mountpoint.io1K views
SOUG_SDM_OracleDB_V3 by UniFabric
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
UniFabric1.1K views
Introduction of ram ddr3 by Technocratz
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
Technocratz 3.4K views
Introduction of ram ddr3 by Jatin Goyal
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
Jatin Goyal776 views
Achieving the Ultimate Performance with KVM by DevOps.com
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
DevOps.com219 views
Persistent Memory Programming: The Current State of the Ecosystem by inside-BigData.com
Persistent Memory Programming: The Current State of the EcosystemPersistent Memory Programming: The Current State of the Ecosystem
Persistent Memory Programming: The Current State of the Ecosystem
inside-BigData.com413 views
The ideal and reality of NVDIMM RAS by Yasunori Goto
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
Yasunori Goto1.1K views
C++ Programming and the Persistent Memory Developers Kit by Intel® Software
C++ Programming and the Persistent Memory Developers KitC++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers Kit
Intel® Software784 views
AMP Kynetics - ELC 2018 Portland by Kynetics
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 Portland
Kynetics704 views
Asymmetric Multiprocessing - Kynetics ELC 2018 portland by Nicola La Gloria
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Nicola La Gloria207 views
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli by Anne Nicolas
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Anne Nicolas2.2K views
Improving MeeGo boot-up time by Hiroshi Doyu
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
Hiroshi Doyu422 views

More from Benoit Hudzia

TLDK - FD.io Sept 2016 by
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 Benoit Hudzia
818 views8 slides
Dram row-hammer kim-talk_isca14 by
Dram row-hammer kim-talk_isca14Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14Benoit Hudzia
4.7K views38 slides
Enhancing Live Migration Process for CPU and/or memory intensive VMs running... by
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...Benoit Hudzia
1.1K views25 slides
Nvmw 2014 extending main memory with flash-the optimized swap approach by
Nvmw 2014  extending main memory with flash-the optimized swap approachNvmw 2014  extending main memory with flash-the optimized swap approach
Nvmw 2014 extending main memory with flash-the optimized swap approachBenoit Hudzia
922 views17 slides
Hana Memory Scale out using the hecatonchire Project by
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectBenoit Hudzia
2.5K views35 slides
Lego Cloud SAP Virtualization Week 2012 by
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
2K views43 slides

More from Benoit Hudzia(7)

TLDK - FD.io Sept 2016 by Benoit Hudzia
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016
Benoit Hudzia818 views
Dram row-hammer kim-talk_isca14 by Benoit Hudzia
Dram row-hammer kim-talk_isca14Dram row-hammer kim-talk_isca14
Dram row-hammer kim-talk_isca14
Benoit Hudzia4.7K views
Enhancing Live Migration Process for CPU and/or memory intensive VMs running... by Benoit Hudzia
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Benoit Hudzia1.1K views
Nvmw 2014 extending main memory with flash-the optimized swap approach by Benoit Hudzia
Nvmw 2014  extending main memory with flash-the optimized swap approachNvmw 2014  extending main memory with flash-the optimized swap approach
Nvmw 2014 extending main memory with flash-the optimized swap approach
Benoit Hudzia922 views
Hana Memory Scale out using the hecatonchire Project by Benoit Hudzia
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire Project
Benoit Hudzia2.5K views
Lego Cloud SAP Virtualization Week 2012 by Benoit Hudzia
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
Benoit Hudzia2K views
Hecatonchire kvm forum_2012_benoit_hudzia by Benoit Hudzia
Hecatonchire kvm forum_2012_benoit_hudziaHecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudzia
Benoit Hudzia2.7K views

Recently uploaded

Initiating and Advancing Your Strategic GIS Governance Strategy by
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance StrategySafe Software
184 views68 slides
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023BookNet Canada
44 views19 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
171 views28 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin by
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
33 views38 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
162 views59 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
38 views34 slides

Recently uploaded(20)

Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue207 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue178 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views

Persistent memory

  • 1. Persistent Memory Dr. Benoit Hudzia @blopeur benoit@stratoscale.com
  • 2. Agenda NVM Evolution Persistent Memory Linux Software Stack Using , Emulating PMEM on Linux Remote PMEM Micro Storage Architecture
  • 4. Persistent Memory Yesterday : Battery Backed RAM Today : NVDIMM with RAM + FLASH Power Down - copy to Flash, Power Up copy Back to RAM Emerging NVDIMM : PCM - 3DX Point - Memristor - etc… Offer 1000x speed vs NAND -> closer to RAM Characteristics as seen by software : Synchronous Model Load / Store memory instruction
  • 5. New Generation HW NVM is no longer the bottleneck But still limited by Block stack latency + Asynchronous Model
  • 6. Asynchronous Model : NVMe “When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf ● Active Polling ( SYNC ) lower latency ( at the expense of CPU) vs interrupt MSI-X (ASYNC) ● Used in Intel SPDK
  • 7. Enter persistent Memory Source: Intel 4KB Read 64B Read
  • 8. Moving away from Block I/O L A T E N C Y A C C E S S
  • 9. Lead to a new Tiered Software Stack
  • 12. Linux kernel (>4.2) subsystem
  • 14. BTT vs DAX BTT : Block translation table provides atomic sector update semantics for persistent memory devices applications that rely on sector writes not being torn can continue to do so. For Legacy application DAX : stands for Direct Access Allows mapping a pmem range directly into userspace via mmap If the application is aware of persistent, byte-addressable memory, and can use it to an advantage, DAX is the best path for it
  • 15. Using , Emulating PMEM on Linux
  • 16. Kernel Config ( > 4.2 ) Enable NVDIMM dynamic debug before you start playing with NVDIMMs Add to the kernel cmd line: libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg ignore_loglevel
  • 17. Pick your PMEM Use ACPI 6.0 compatible NVDIMM hardware or legacy NVDIMMs Use virtual NVDIMMs provided by hypervisor RAM as persistent memory PCMSIM: NVM-disk Emulation
  • 18. Emulation : RAM as PMEM Bare metal : Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory, starting at 16G. cat /proc/cmdline : BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181 resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G BTT works
  • 19. QEMU NVDIMM Qemu : qemu-system-x86_64 -object memory-backend-file,share,id=mem1,mem- path=/dax/D1 -device nvdimm,memdev=mem1,reserve-label-data,id=nv1 -m 2048,maxmem=100G,slots=10 …. Not yet in Upstream Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9 Seabios integration : http://www.seabios.org/pipermail/seabios/2015-September/009770.html
  • 20. Playing with DAX Only ext2, ext4 and xfs currently support DAX Note that block size should match page size mkfs.ext4 -b 4096 /dev/pmem1 mount -t ext4 -o dax /dev/pmem1 /tmp/dax/
  • 21. Playing with DAX - Cont Then you just have to mmap it! But remember: CFLUSH, etc.. for durability
  • 22. NVML : Lets somebody else do the heavy lifting http://pmem.io/ libpmem – Basic persistency handling Libvmmalloc - Transparently converts all the dynamic memory allocations into persistent memory allocations. libpmemblk – Block access to pmem libpmemlog - Log file on pmem (append-mostly) libpmemobj - Transactional Object Store on pmem Many more… pynvm , C++ bidings , etc..
  • 24. Remote NVMe : using RDMA to transfer NVMe commands & data http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/
  • 25. Transitioning from Indirect to Direct Flow ● Project Donard ( PMC - Microsemi) ● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)
  • 26. Comes with Challenge : Durability vs Visibility http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf
  • 28. RDMA + Non Allocating write
  • 29. Peer 2 Peer : Bypassing CPU + SW bottleneck ● NVM HW - Expose BAR address ● March 16 : RFC patchset for DAX allowing DMA to I/O mem ● CCIX fabric ● Use case: ○ Pre-process in Data path ○ Avoid RAM buffer ( HMM style ) ○ SW only fetch what is necessary
  • 30. Future Hyperscale Architecture NVMe gravy train for 3-5 years Transition to Pmem optimised apps and Natural evolution of Ethernet Connected Drive => Fabric connected Pmem Durable Array of Wimpy Nodes Direct PMEM Low power High perf K/V storage Use pluggable front end
  • 31. Links Drivers specs: http://pmem.io/documents/ NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txt Qemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9 Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html Libraries: https://github.com/pmem/nvml/ https://github.com/perone/pynvm http://opennvm.github.io/index.html https://github.com/spdk/spdk Project : PMFS : https://github.com/linux-pmfs/pmfs NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVA PCMSIM : https://code.google.com/p/pcmsim/ Patch : Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target : http://www.spinics.net/lists/linux- mm/msg103990.html