SlideShare a Scribd company logo
1 of 26
Download to read offline
15th ANNUAL WORKSHOP 2019
REMOTE PERSISTENT MEMORY
PERFORMANCE, CAPACITY, OR PERSISTENCE?
Paul Grun
[ March 20, 2019 ]
Cray Inc
OpenFabrics Alliance Workshop 20192
This is an update on a long-
running Work in Progress
(from last year’s Workshop)
AT LAST YEAR’S WORKSHOP …
OpenFabrics Alliance Workshop 20193
…we began to focus on
“use cases”
The discussion
continued at FMS 2018
And continues again
today
Objective is to define a set of use
cases and requirements that can
be used to drive an API definition
THREE CATEGORIES OF USE CASES WERE DESCRIBED
OpenFabrics Alliance Workshop 20194
FLASH MEMORY SUMMIT 2018
OpenFabrics Alliance Workshop 20195
We began to describe
Consumer Requirements and
System Objectives that will
impact the network
architecture needed to support
RPM
PERSISTENT MEMORY SUMMIT 2019
OpenFabrics Alliance Workshop 20196
We began to discover that use
cases imply certain system
characteristics
Persistence is only one
MOVING THE BALL A LITTLE FURTHER DOWNFIELD
▪ Objective for this session – integrate the various characteristics of RPM into a discussion
of the use cases already presented
▪ Ultimate goal – Propose an API that meets the requirements described by the set of use
cases
7 OpenFabrics Alliance Workshop 2019
INTRODUCING THE ‘CHARACTERISTICS’ VARIABLE
▪ Many view the emerging Persistent Memory layer in the memory hierarchy as
monolithic, evolving toward Nirvana
• Nirvana defined as “infinite capacity, infinite bandwidth, zero latency, zero cost”
• Oh, and “infinite retention”
▪ The truth is that there will always be tradeoffs
• Performance vs Capacity vs Cost
• Local vs Remote
▪ How to choose the right tradeoffs?
8 OpenFabrics Alliance Workshop 2019
Our objective today is to take a refined look at the
emerging list of use cases and try to understand which
characteristics matter most
Assertion – understanding these characteristics, and
how they map onto different use cases, will guide
the development of networks to support RPM.
(Which is our ultimate goal.)
THE FAMILIAR MEMORY HIERARCHY
OpenFabrics Alliance Workshop 20199
memory
storage
It’s a dessert topping!
It’s a floor wax! *
It’s clear that Persistent
Memory isn’t exactly memory,
and it’s not precisely storage…
* With thanks to SNL, 1/10/76
…so how do we characterize it?
What role does it fill, exactly?
THE FAMILIAR MEMORY HIERARCHY …
OpenFabrics Alliance Workshop 201910
memory
storage
… with a wrinkle
Local
Remote
capacity
performance
…and there are tradeoffs
within the sublayers
Turns out that this new
layer isn’t monolithic…
PM
cost
HotColdWarm
KEY DRIVERS
OpenFabrics Alliance Workshop 201911
Application requirements
Is data being shared among threads or nodes?
Are there application performance or capacity requirements?
Key system design objectives Scalability? In which dimension? Single server? Cluster?
Selecting the right technology
depends on understanding (at
least):
Eventual API proposal should reflect a combination of
Use Cases and App Requirements
OpenFabrics Alliance Workshop 2019 12
Database
Applications
• A modifiable,
in-memory
database that
survives power
cycles
Data Analytics
• Create a
persistent
database once,
run new
queries
repeatedly
Graph Analytics
• Operate on
larger graphs
than would fit
in local
memory
Commercial
Applications
• Enable
collaboration
on large scale
projects
HPC
Applications
• Scalability,
parallel
applications
• Checkpointing
EXAMPLE TARGETS FOR PM
capacity, density, performance persistence, capacity, costpersistence, capacity
USE CASES, SO FAR - WIP
▪ Data Availability/Protection
Replicate local cache to RPM to achieve data availability
▪ Improved Uptime, Fast Restart
Quick server recovery following power cycle
Checkpoint restart
▪ Local System Performance
Eliminate disk accesses e.g. to stored databases
▪ Scale Up Architectures
In-memory databases that exceed local DRAM capacity
▪ Scale Out Architectures
Distributed databases, analytics applications, HPC parallel applications
▪ Disaggregated System Architectures
Compute capacity scales independently of memory capacity
▪ Shared Data
Support simultaneous data access from multiple processes
A central shared repository for a distributed team collaborating on a large artifact
▪ Improved Disk Storage Performance
13 OpenFabrics Alliance Workshop 2019
- First developed at last year’s
RPM Think Tank,
- Revised and extended at
Flash Memory Summit 2018,
- And again at the PM Summit
2019
SOME APPLICATION CHARACTERISTICS
▪ Application Objectives
Performance vs capacity?
▪ Sharing Models
Shared data vs unshared data?
A shared service vs a dedicated service?
▪ Memory Model
Flat address space vs object stores?
▪ Characteristic Traffic Patterns
Small byte operations vs bulk data transfer?
▪ Ordering Semantics, Atomicity
▪ …
14 OpenFabrics Alliance Workshop 2019
These aren’t exactly “Use Cases”, but will clearly impact the API design
PERSISTENCE? NOT ALWAYS REQUIRED
▪ Persistence is valuable for:
High Availability applications where maintaining state between power cycles is crucial
Reducing or eliminating the need to access slower media, e.g. HDDs
Data protection and preservation
▪ Persistence not required, but nice to have:
Certain applications, such as analytics, that require establishing a database. Build the database
once, run multiple queries against it
Collaborative workspaces
▪ Other characteristics may prove to be more valuable than persistence
15 OpenFabrics Alliance Workshop 2019
If the app doesn’t need persistence, then the so-called convergence of storage and memory is uninteresting
FOR EXAMPLE…
▪ Performance
• Persistence often comes at the cost of performance (but not always)
▪ Cost
• If you can accept a lower level of performance, or you do not care much about byte addressability,
there may be lower cost options available
▪ Capacity
• To achieve higher capacity, you might wish to use a different technology, sacrificing e.g. byte
addressability for higher capacity
16 OpenFabrics Alliance Workshop 2019
1ST ORDER TRADEOFF: LOCAL VS REMOTE
▪ Some requirements are met by siting persistent memory devices on the local compute node
Capacity-based applications
Some data protection usages
Replacement of local storage for performance reasons
▪ Others are only achieved by distributing persistent memory
Compute/memory disaggregation
independent scaling of compute and memory
Shared resource / shared data
Team collaboration
Distributed/Scale-out applications
17 OpenFabrics Alliance Workshop 2019
Needless to say, this is our
focus at the moment - RPM
Local may be synchronous, Remote is almost certain to be asynchronous
▪ Data Availability/Protection
Replicate local cache to RPM to achieve high availability
▪ Local System Performance
Eliminate disk accesses
▪ Scale Out Architectures
Scale out distributed databases, analytics applications, HPC parallel applications
▪ Scale Up Architectures
Scale up databases that exceed local memory capacity
▪ Disaggregated System Architectures
Compute capacity scales independently of memory capacity
▪ Shared Data
Support simultaneous data access to large teams
▪ Improved Uptime, Fast Restart
Quick server recovery following power cycle
Checkpoint restart
USE CASES – LOCAL PM
18 OpenFabrics Alliance Workshop 2019
Local Performance
Scale Up Architectures
Fast Restart
√√√
√
√√√
√√
√√√
√
√√
√√√
√
Persistence Performance Capacity
these need to be refined and
developed in much more detail
REMOTE PM – SYSTEM AND MEMORY MODELS
OpenFabrics Alliance Workshop 201919
Organized into pools,
accessed as memory
Can be configured as a flat address
space, or as object storage.
Or both.
NIC
CPU
DDR
NIC
CPU
DDR
NIC
CPU
DDR
. . .
NIC NICNIC
network
RPM
service
node
RPM
service
node
RPM
service
node
Shared or unshared resource
All will have a significant impact on the API
USE CASES – REMOTE PM
▪ Data Availability/Protection
Replicate local cache to RPM to achieve high availability
▪ Improved Uptime, Fast Restart
Quick server recovery following power cycle
Checkpoint restart
▪ Local System Performance
Eliminate disk accesses e.g. to stored databases
▪ Scale Out Architectures
Scale out distributed databases, analytics applications, HPC parallel applications
▪ Scale Up Architectures
Scale up databases that exceed local memory capacity
▪ Disaggregated System Architectures
Compute capacity scales independently of memory capacity
▪ Shared Data
Support simultaneous data access from multiple processes
A central shared repository for a distributed team collaborating on a large artifact
20 OpenFabrics Alliance Workshop 2019
DATA PROTECTION USE CASE
OpenFabrics Alliance Workshop 201921
What it looks like
How it works
Usage: replicate data that is stored in local PM
across a fabric and store it in remote PM
Data Availability √√√ √√ √
Persistence Performance Capacity
Checkpoint √√√ √√ √√
SCALE OUT USE CASE
OpenFabrics Alliance Workshop 201922
Usage: Expand on-node memory capacity, while taking
advantage of persistence (or not). Disaggregate memory
from compute.
remote
memory
service
PM
PM
PM
app
DDR
NIC
app
DDR
NIC
…
user Remote PM
completion
put
“Scalable Memory”
Scale Out
Disaggregation
√
√√
√√√
√√√
√√√
√√√
Persistence Performance Capacity
SHARED DATA USE CASE
OpenFabrics Alliance Workshop 201923
How it works
Usage: Information is shared among the
elements of a distributed application. Persistence
can be used to guard against node failure.
PM
app
NIC
app
NIC
Remote Shared
Memory Service
user
completion
user
put get
notice
Remote
PM
Shared Data √√ √√ √√√
Persistence Performance Capacity
SOME PRELIMINARY OBSERVATIONS
▪ Non-persistent use case don’t require flush semantics
▪ RPM is NUMA
▪ APIs for local vs remote PM are likely to be different, because of asynchronicity
▪ Capacity use cases likely have different access patterns than e.g. performance use
cases
• large reads / writes vs byte level accesses
▪ For persistence use cases, some should be ‘automatic’ e.g. Data Protection, others
should be ‘on-demand’
▪ Distinguish between the access method that the client sees vs the technology that is
implemented on the RPM node
• They are very different things
▪ Consider the chicken and the egg – it’s hard to predict what will be needed for new
application models
PM as an accelerator for existing application models, or
PM as an enabler of new application models
24 OpenFabrics Alliance Workshop 2019
NEXT STEPS
1. Commit the foregoing to text, allowing us to dig into the details
2. Begin thinking about what this implies for the API
25 OpenFabrics Alliance Workshop 2019
15th ANNUAL WORKSHOP 2019
THANK YOU

More Related Content

What's hot

Solutions_using_Blades_ITO0108
Solutions_using_Blades_ITO0108Solutions_using_Blades_ITO0108
Solutions_using_Blades_ITO0108
H Nelson Stewart
 

What's hot (18)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
IBM BladeCenter: Build smarter IT. IBM Systems and Technology
IBM BladeCenter: Build smarter IT. IBM Systems and TechnologyIBM BladeCenter: Build smarter IT. IBM Systems and Technology
IBM BladeCenter: Build smarter IT. IBM Systems and Technology
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Solutions_using_Blades_ITO0108
Solutions_using_Blades_ITO0108Solutions_using_Blades_ITO0108
Solutions_using_Blades_ITO0108
 
Redefining SAP technology landscape
Redefining SAP technology landscapeRedefining SAP technology landscape
Redefining SAP technology landscape
 
Capacity Efficiency: Identifying the Right Solutions for the Right Challenge
Capacity Efficiency: Identifying the Right Solutions for the Right ChallengeCapacity Efficiency: Identifying the Right Solutions for the Right Challenge
Capacity Efficiency: Identifying the Right Solutions for the Right Challenge
 
IMS05 IMS V14 8gb osam for haldb
IMS05   IMS V14 8gb osam for haldbIMS05   IMS V14 8gb osam for haldb
IMS05 IMS V14 8gb osam for haldb
 
Hp - 14oct2010
Hp - 14oct2010Hp - 14oct2010
Hp - 14oct2010
 
How and why to upgrade to hitachi device manager v7 webinar
How and why to upgrade to hitachi device manager v7 webinarHow and why to upgrade to hitachi device manager v7 webinar
How and why to upgrade to hitachi device manager v7 webinar
 
Qmf for z os nordic db2 days andy
Qmf for z os nordic db2 days andyQmf for z os nordic db2 days andy
Qmf for z os nordic db2 days andy
 
IMS04 BMC Software Strategy and Roadmap
IMS04   BMC Software Strategy and RoadmapIMS04   BMC Software Strategy and Roadmap
IMS04 BMC Software Strategy and Roadmap
 
The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™The IBM Data Engine for NoSQL on IBM Power Systems™
The IBM Data Engine for NoSQL on IBM Power Systems™
 
Superior Cloud Economics with IBM Power Systems
Superior Cloud Economics with IBM Power SystemsSuperior Cloud Economics with IBM Power Systems
Superior Cloud Economics with IBM Power Systems
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
 
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dS108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905d
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and BeyondA Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
 
Future of Power: IBM PureFlex - Kim Mortensen
Future of Power: IBM PureFlex - Kim MortensenFuture of Power: IBM PureFlex - Kim Mortensen
Future of Power: IBM PureFlex - Kim Mortensen
 

Similar to Characteristics of Remote Persistent Memory – Performance, Capacity, or Locality. Which One(s)?

Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecture
Samantha_Roehl
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
Jeremy Maranitch
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
Fujitsu India
 
NetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPTNetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPT
Shridhar Shriraghavan
 

Similar to Characteristics of Remote Persistent Memory – Performance, Capacity, or Locality. Which One(s)? (20)

New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Workload Centric Scale-Out Storage for Next Generation Datacenter
Workload Centric Scale-Out Storage for Next Generation DatacenterWorkload Centric Scale-Out Storage for Next Generation Datacenter
Workload Centric Scale-Out Storage for Next Generation Datacenter
 
Macroview Netapp Overview
Macroview Netapp OverviewMacroview Netapp Overview
Macroview Netapp Overview
 
NetApp All Flash storage
NetApp All Flash storageNetApp All Flash storage
NetApp All Flash storage
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecture
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecture
 
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandWhy is Virtualization Creating Storage Sprawl? By Storage Switzerland
Why is Virtualization Creating Storage Sprawl? By Storage Switzerland
 
Storage Considerations for VDI - Scalar presentation at Toronto VMUG 2014
Storage Considerations for VDI - Scalar presentation at Toronto VMUG 2014Storage Considerations for VDI - Scalar presentation at Toronto VMUG 2014
Storage Considerations for VDI - Scalar presentation at Toronto VMUG 2014
 
Best practices: running high-performance databases on Kubernetes
Best practices: running high-performance databases on KubernetesBest practices: running high-performance databases on Kubernetes
Best practices: running high-performance databases on Kubernetes
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2l
 
NetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPTNetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPT
 
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
 
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
 
Data Engine for NoSQL - IBM Power Systems
Data Engine for NoSQL - IBM Power SystemsData Engine for NoSQL - IBM Power Systems
Data Engine for NoSQL - IBM Power Systems
 
Panasonic information systems
Panasonic information systemsPanasonic information systems
Panasonic information systems
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Characteristics of Remote Persistent Memory – Performance, Capacity, or Locality. Which One(s)?

  • 1. 15th ANNUAL WORKSHOP 2019 REMOTE PERSISTENT MEMORY PERFORMANCE, CAPACITY, OR PERSISTENCE? Paul Grun [ March 20, 2019 ] Cray Inc
  • 2. OpenFabrics Alliance Workshop 20192 This is an update on a long- running Work in Progress (from last year’s Workshop)
  • 3. AT LAST YEAR’S WORKSHOP … OpenFabrics Alliance Workshop 20193 …we began to focus on “use cases” The discussion continued at FMS 2018 And continues again today Objective is to define a set of use cases and requirements that can be used to drive an API definition
  • 4. THREE CATEGORIES OF USE CASES WERE DESCRIBED OpenFabrics Alliance Workshop 20194
  • 5. FLASH MEMORY SUMMIT 2018 OpenFabrics Alliance Workshop 20195 We began to describe Consumer Requirements and System Objectives that will impact the network architecture needed to support RPM
  • 6. PERSISTENT MEMORY SUMMIT 2019 OpenFabrics Alliance Workshop 20196 We began to discover that use cases imply certain system characteristics Persistence is only one
  • 7. MOVING THE BALL A LITTLE FURTHER DOWNFIELD ▪ Objective for this session – integrate the various characteristics of RPM into a discussion of the use cases already presented ▪ Ultimate goal – Propose an API that meets the requirements described by the set of use cases 7 OpenFabrics Alliance Workshop 2019
  • 8. INTRODUCING THE ‘CHARACTERISTICS’ VARIABLE ▪ Many view the emerging Persistent Memory layer in the memory hierarchy as monolithic, evolving toward Nirvana • Nirvana defined as “infinite capacity, infinite bandwidth, zero latency, zero cost” • Oh, and “infinite retention” ▪ The truth is that there will always be tradeoffs • Performance vs Capacity vs Cost • Local vs Remote ▪ How to choose the right tradeoffs? 8 OpenFabrics Alliance Workshop 2019 Our objective today is to take a refined look at the emerging list of use cases and try to understand which characteristics matter most Assertion – understanding these characteristics, and how they map onto different use cases, will guide the development of networks to support RPM. (Which is our ultimate goal.)
  • 9. THE FAMILIAR MEMORY HIERARCHY OpenFabrics Alliance Workshop 20199 memory storage It’s a dessert topping! It’s a floor wax! * It’s clear that Persistent Memory isn’t exactly memory, and it’s not precisely storage… * With thanks to SNL, 1/10/76 …so how do we characterize it? What role does it fill, exactly?
  • 10. THE FAMILIAR MEMORY HIERARCHY … OpenFabrics Alliance Workshop 201910 memory storage … with a wrinkle Local Remote capacity performance …and there are tradeoffs within the sublayers Turns out that this new layer isn’t monolithic… PM cost HotColdWarm
  • 11. KEY DRIVERS OpenFabrics Alliance Workshop 201911 Application requirements Is data being shared among threads or nodes? Are there application performance or capacity requirements? Key system design objectives Scalability? In which dimension? Single server? Cluster? Selecting the right technology depends on understanding (at least): Eventual API proposal should reflect a combination of Use Cases and App Requirements
  • 12. OpenFabrics Alliance Workshop 2019 12 Database Applications • A modifiable, in-memory database that survives power cycles Data Analytics • Create a persistent database once, run new queries repeatedly Graph Analytics • Operate on larger graphs than would fit in local memory Commercial Applications • Enable collaboration on large scale projects HPC Applications • Scalability, parallel applications • Checkpointing EXAMPLE TARGETS FOR PM capacity, density, performance persistence, capacity, costpersistence, capacity
  • 13. USE CASES, SO FAR - WIP ▪ Data Availability/Protection Replicate local cache to RPM to achieve data availability ▪ Improved Uptime, Fast Restart Quick server recovery following power cycle Checkpoint restart ▪ Local System Performance Eliminate disk accesses e.g. to stored databases ▪ Scale Up Architectures In-memory databases that exceed local DRAM capacity ▪ Scale Out Architectures Distributed databases, analytics applications, HPC parallel applications ▪ Disaggregated System Architectures Compute capacity scales independently of memory capacity ▪ Shared Data Support simultaneous data access from multiple processes A central shared repository for a distributed team collaborating on a large artifact ▪ Improved Disk Storage Performance 13 OpenFabrics Alliance Workshop 2019 - First developed at last year’s RPM Think Tank, - Revised and extended at Flash Memory Summit 2018, - And again at the PM Summit 2019
  • 14. SOME APPLICATION CHARACTERISTICS ▪ Application Objectives Performance vs capacity? ▪ Sharing Models Shared data vs unshared data? A shared service vs a dedicated service? ▪ Memory Model Flat address space vs object stores? ▪ Characteristic Traffic Patterns Small byte operations vs bulk data transfer? ▪ Ordering Semantics, Atomicity ▪ … 14 OpenFabrics Alliance Workshop 2019 These aren’t exactly “Use Cases”, but will clearly impact the API design
  • 15. PERSISTENCE? NOT ALWAYS REQUIRED ▪ Persistence is valuable for: High Availability applications where maintaining state between power cycles is crucial Reducing or eliminating the need to access slower media, e.g. HDDs Data protection and preservation ▪ Persistence not required, but nice to have: Certain applications, such as analytics, that require establishing a database. Build the database once, run multiple queries against it Collaborative workspaces ▪ Other characteristics may prove to be more valuable than persistence 15 OpenFabrics Alliance Workshop 2019 If the app doesn’t need persistence, then the so-called convergence of storage and memory is uninteresting
  • 16. FOR EXAMPLE… ▪ Performance • Persistence often comes at the cost of performance (but not always) ▪ Cost • If you can accept a lower level of performance, or you do not care much about byte addressability, there may be lower cost options available ▪ Capacity • To achieve higher capacity, you might wish to use a different technology, sacrificing e.g. byte addressability for higher capacity 16 OpenFabrics Alliance Workshop 2019
  • 17. 1ST ORDER TRADEOFF: LOCAL VS REMOTE ▪ Some requirements are met by siting persistent memory devices on the local compute node Capacity-based applications Some data protection usages Replacement of local storage for performance reasons ▪ Others are only achieved by distributing persistent memory Compute/memory disaggregation independent scaling of compute and memory Shared resource / shared data Team collaboration Distributed/Scale-out applications 17 OpenFabrics Alliance Workshop 2019 Needless to say, this is our focus at the moment - RPM Local may be synchronous, Remote is almost certain to be asynchronous
  • 18. ▪ Data Availability/Protection Replicate local cache to RPM to achieve high availability ▪ Local System Performance Eliminate disk accesses ▪ Scale Out Architectures Scale out distributed databases, analytics applications, HPC parallel applications ▪ Scale Up Architectures Scale up databases that exceed local memory capacity ▪ Disaggregated System Architectures Compute capacity scales independently of memory capacity ▪ Shared Data Support simultaneous data access to large teams ▪ Improved Uptime, Fast Restart Quick server recovery following power cycle Checkpoint restart USE CASES – LOCAL PM 18 OpenFabrics Alliance Workshop 2019 Local Performance Scale Up Architectures Fast Restart √√√ √ √√√ √√ √√√ √ √√ √√√ √ Persistence Performance Capacity these need to be refined and developed in much more detail
  • 19. REMOTE PM – SYSTEM AND MEMORY MODELS OpenFabrics Alliance Workshop 201919 Organized into pools, accessed as memory Can be configured as a flat address space, or as object storage. Or both. NIC CPU DDR NIC CPU DDR NIC CPU DDR . . . NIC NICNIC network RPM service node RPM service node RPM service node Shared or unshared resource All will have a significant impact on the API
  • 20. USE CASES – REMOTE PM ▪ Data Availability/Protection Replicate local cache to RPM to achieve high availability ▪ Improved Uptime, Fast Restart Quick server recovery following power cycle Checkpoint restart ▪ Local System Performance Eliminate disk accesses e.g. to stored databases ▪ Scale Out Architectures Scale out distributed databases, analytics applications, HPC parallel applications ▪ Scale Up Architectures Scale up databases that exceed local memory capacity ▪ Disaggregated System Architectures Compute capacity scales independently of memory capacity ▪ Shared Data Support simultaneous data access from multiple processes A central shared repository for a distributed team collaborating on a large artifact 20 OpenFabrics Alliance Workshop 2019
  • 21. DATA PROTECTION USE CASE OpenFabrics Alliance Workshop 201921 What it looks like How it works Usage: replicate data that is stored in local PM across a fabric and store it in remote PM Data Availability √√√ √√ √ Persistence Performance Capacity Checkpoint √√√ √√ √√
  • 22. SCALE OUT USE CASE OpenFabrics Alliance Workshop 201922 Usage: Expand on-node memory capacity, while taking advantage of persistence (or not). Disaggregate memory from compute. remote memory service PM PM PM app DDR NIC app DDR NIC … user Remote PM completion put “Scalable Memory” Scale Out Disaggregation √ √√ √√√ √√√ √√√ √√√ Persistence Performance Capacity
  • 23. SHARED DATA USE CASE OpenFabrics Alliance Workshop 201923 How it works Usage: Information is shared among the elements of a distributed application. Persistence can be used to guard against node failure. PM app NIC app NIC Remote Shared Memory Service user completion user put get notice Remote PM Shared Data √√ √√ √√√ Persistence Performance Capacity
  • 24. SOME PRELIMINARY OBSERVATIONS ▪ Non-persistent use case don’t require flush semantics ▪ RPM is NUMA ▪ APIs for local vs remote PM are likely to be different, because of asynchronicity ▪ Capacity use cases likely have different access patterns than e.g. performance use cases • large reads / writes vs byte level accesses ▪ For persistence use cases, some should be ‘automatic’ e.g. Data Protection, others should be ‘on-demand’ ▪ Distinguish between the access method that the client sees vs the technology that is implemented on the RPM node • They are very different things ▪ Consider the chicken and the egg – it’s hard to predict what will be needed for new application models PM as an accelerator for existing application models, or PM as an enabler of new application models 24 OpenFabrics Alliance Workshop 2019
  • 25. NEXT STEPS 1. Commit the foregoing to text, allowing us to dig into the details 2. Begin thinking about what this implies for the API 25 OpenFabrics Alliance Workshop 2019
  • 26. 15th ANNUAL WORKSHOP 2019 THANK YOU