SC15 PMIx Birds-of-a-Feather

R
Process Management
Interface – Exascale
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration status
• Roadmap
• Malleable application support
• Wrap-Up/Open Forum
PMIx – PMI exascale
Collaborative open source effort led by Intel, Mellanox
Technologies, IBM, Adaptive Computing, and SchedMD.
New collaborators are most welcome!
Contributors
• Intel
 Ralph Castain
 Annapurna Dasari
• Mellanox
 Joshua Ladd
 Artem Polyakov
 Elena Shipunova
 Nadezhda Kogteva
 Igor Ivanov
• HP
 David Linden
 Andy Riebs
• IBM
 Dave Solt
• Adaptive Computing
 Gary Brown
• RIST
 Gilles Gouaillardet
• SchedMD
 David Bigagli
• LANL
 Nathan Hjelmn
Motivation
• Exascale launch times are a hot topic
 Desire: reduce from many minutes to few
seconds
 Target: O(106) MPI processes on O(105)
nodes thru MPI_Init in < 30 seconds
• New programming models are exploding
 Driven by need to efficiently exploit scale vs.
resource constraints
 Characterized by increased app-RM
integration
Launch
Initialization
ExchangeMPIcontactinfo
SetupMPIstructures
barrier
mpi_initcompletion
barrier
MPI_Init MPI_Finalize
RRZ, 16-nodes, 8ppn, rank=0
Time(sec)
What Is Being Shared?
• Job Info (~90%)
 Names of participating nodes
 Location and ID of procs
 Relative ranks of procs (node, job)
 Sizes (#procs in job, #procs on each node)
• Endpoint info (~10%)
 Contact info for each supported fabric
Known to
local RM
daemon
Can be
computed for
many fabrics
Launch
Initialization
ExchangeMPIcontactinfo
SetupMPIstructures
barrier
mpi_initcompletion
barrier
MPI_Init MPI_Finalize
RRZ, 16-nodes, 8ppn, rank=0
Time(sec) Stage I
Provide method for
RM to share job
info
Work with fabric
and library
implementers to
compute endpt
from RM info
Launch
Initialization
ExchangeMPIcontactinfo
SetupMPIstructures
barrier
mpi_initcompletion
barrier
MPI_Init MPI_Finalize
RRZ, 16-nodes, 8ppn, rank=0
Time(sec) Stage II
Add on 1st
communication
(non-PMIx)
Launch
Initialization
ExchangeMPIcontactinfo
SetupMPIstructures
barrier
mpi_initcompletion
barrier
MPI_Init MPI_Finalize
RRZ, 16-nodes, 8ppn, rank=0
Time(sec) Stage III
Use HSN+Coll
Changing Needs
• Notifications/response
 Errors, resource changes
 Negotiated response
• Request allocation changes
 shrink/expand
• Workflow management
 Steered/conditional execution
• QoS requests
 Power, file system, fabric
Multiple,
use-
specific
libs?
(difficult for RM
community to
support)
Single,
multi-
purpose
lib?
Objectives
• Establish an independent, open community
 Industry, academia, lab
• Standalone client/server libraries
 Ease adoption, enable broad/consistent support
 Open source, non-copy-left
 Transparent backward compatibility
• Support evolving programming requirements
• Enable “Instant On” support
 Eliminate time-devouring steps
 Provide faster, more scalable operations
Today’s Goal
• Inform the community
• Solicit your input on the roadmap
• Get you a little excited
• Encourage participation
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration/development status
• Roadmap
• Malleable/Evolving application support
• Wrap-Up/Open Forum
PMIx End Users
• OSHMEM consumers
 In Open MPI OSHMEM:
𝑠ℎ𝑚𝑒𝑚_𝑖𝑛𝑖𝑡 = 𝑚𝑝𝑖_𝑖𝑛𝑖𝑡 + 𝐶
• Job launch scales as MPI_Init.
 Data driven communication patterns
• Assume dense connectivity
• MPI consumers
 Large class of applications have spare connectivity
• Ideal for an API that supports the direct modex
concept
What’s been done
• Worked closely with customers, OEMs, and open source community to design a scalable API that addresses measured
limitations of PMI2
 Data driven design.
• Led to the PMIx v1.0 API
• Implementation and imminent release of PMIx v1.1
 November 2015 release scheduled.
• Significant architectural changes in Open MPI to support direct modex
 “Add procs” in bulk MPI_Init  “Add proc” on-demand on first use outside MPI_init.
 Available in the OMPI v2.x release Q1 2016.
• Integrated PMIx into Open MPI v2.x
 For native launching as well as direct launching under supported RMs.
 For mpirun launched jobs, ORTE implements PMIx callbacks.
 For srun launched jobs, SLURM implements PMIx callbacks in the PMIx plugin.
 Client side framework added to OPAL with components for
• Cray PMI
• PMI1
• PMI2
• PMIx
• backwards compatibility with PMI1 and PMI2.
• Implemented and submitted upstream SLURM PMIx plugin
 Currently available in SLURM Head
 To be released in SLURM 16.05
 Client side PMIx Framework and S1, S2, PMIxxx components in OPAL
• PMIx unit tests integrated into Jenkins test harness
srun --mpi=xxx hello_world
0
1
2
3
4
5
6
7
8
0 500 1000 1500 2000 2500 3000 3500 4000
Time(sec.)
MPI/OSHMEM Processes
MPI_Init / Shmem_init
PMIx async
PMIx
PMI2
Open MPI Trunk
SLURM 16.05 prerelease with PMIx plugin
PMIx v1.1
BTLs openib,self,vader
Stage I-II
srun --mpi=pmix ./hello_world
0
10
20
30
40
50
60
70
80
0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000
PMIx async projected performance
Stage I-II
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 10 20 30 40 50 60 70
Time(millisec)
Nodes
bin-true
orte-no-op
mpi hello world
async modex
Daemon wireup – linear scaling needs to be
addressed
mpirun/oshrun ./hello_world
Conclusions
• API is implemented and performing well in a variety of settings
 Server integrated in OMPI for native launching and in SLURM as PMIx plugin for
direct launching.
• PMIx shows improvement over other state-of-the-art PMI2
implementations when doing a full modex
 Data blobs versus encoded metakeys
 Data scoping to reduce the modex size
• PMIx supported direct modex significantly outperforms full modex
operations for BTL/MTLs that can support this feature
• Direct modex still scales as O(N)
• Efforts and energy should be focused on daemon bootstrap problem
• Instant-on capabilities could be used to further reduce deamon
bootstrap time
Next Steps
• Leverage PMIx features
• Reduce modex size with data scoping
• Change MTL/PMLs to support direct modex
• Investigate the impact of direct modex on densely
connected applications
• Continue to improve collective performance
 Still need to have a scalable solution
• Focus more efforts on the daemon bootstrap problem –
this becomes the limiting factor moving to exascale
 Leverage instant-on here as well
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration/development status
• Roadmap
• Malleable application support
• Wrap-Up/Open Forum
Client Implementation Status
• PMIx 1.1.1 released
 Complete API definition
• Future-proof API’s with Info array/length parameter
for most calls
• Blocking/non-blocking versions of most calls
• Picked up by Fedora, others to come
• PMIx MPI clients launched/tested with
 ORTE (indirect) / ORCM (direct launch)
 SLURM servers (direct launch)
 IBM PMIx server (direct launch)
Server Implementation Status
• Server implementation time is greatly
reduced through the PMIx convenience
library
 Handles all server/client interactions
 Handles many PMIx requests that can be
handled locally
 Bundles many off-host requests
• Optional
 RMs free to implement their own
Server Implementation Status
• Moab
 Integrated PMIx server in scheduler/launcher
 Currently integrating PMIx effort with Moab
 Scheduled for general availability: no time set
• ORTE/ORCM
 Full embedded PMIx reference server
implementation
 Scheduled for release with v2.0
Server Implementation Status
• SLURM
 PMIx support for initial job launch/wireup
currently developed & tested
 Scheduled for GA: 16.05 release
• IBM/LSF
 CORAL
• PMIx support for initial job launch/wireup currently
developed & tested w/PM
• Full PMIx support planned for CORAL
 Integration to LSF to follow (TBD)
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration/development status
• Roadmap
• Malleable/Evolving application support
• Wrap-Up/Open Forum
Scalability
• Memory footprint
 Distributed database for storing Key-Values
• Memory cache, DHT, other models?
 One instance of database per node
• "zero-message" data access using shared-memory
• Launch scaling
 Enhanced support for collective operations
• Provide pattern to host, host-provided send/recv functions,
embedded inter-node comm?
 Rely on HSN for launch, wireup support
• While app is quiescent, then return to OOB
Flexible Allocation Support
• Request additional resources
 Compute, memory, network, NVM, burst buffer
 Immediate, forecast
 Expand existing allocation, separate allocation
• Return extra resources
 No longer required
 Will not be used for some specified time, reclaim
(handshake) when ready to use
• Notification of preemption
 Provide opportunity to cleanly pause
I/O Support
• Asynchronous operations
 Anticipatory data fetch, staging
 Advise time to complete
 Notify upon available
• Storage policy requests
 Hot/warm/cold data movement
 Desired locations and striping/replication patterns
 Persistence of files, shared memory regions across
jobs, sessions
 ACL to generated data across jobs, sessions
Spawn Support
• Staging support
 Files, libraries required by new apps
 Allow RM to consider in scheduler
• Current location of data
• Time to retrieve and position
• Schedule scalable preload
• Provisioning requests
 Allow RM to consider in selecting resources,
minimize startup time due to provisioning
 Desired image, packages
Network Integration
• Quality of service requests
 Bandwidth, traffic priority, power constraints
 Multi-fabric failover, striping prioritization
 Security requirements
• Network domain definitions, ACLs
• Notification requests
 State-of-health
 Update process endpoint upon fault recovery
• Topology information
 Torus, dragonfly, …
Power Control/Management
• Application requests
 Advise of changing workload requirements
 Request changes in policy
 Specify desired policy for spawned applications
 Transfer allocated power to specifed job
• RM notifications
 Need to change power policy
• Allow application to accept, request pause
 Preemption notification
• Provide backward compatibility with
PowerAPI
PMIx: Fault Tolerance
• Notification
 App can register for error notifications, incipient faults
• RM will notify when app would be impacted
• App responds with desired action
 Terminate/restart job, wait for checkpoint, etc.
• RM/app negotiate final response
 App can notify RM of errors
• RM will notify specified, registered procs
• Restart support
 Specify source (remote NVM checkpoint, global filesystem, etc)
 Location hints/requests
 Entire job, specific processes
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration/development status
• Roadmap
• Malleable/Evolving application support
• Wrap-Up/Open Forum
Job Types
• Rigid
• Moldable
• Malleable
• Evolving
• Adaptive (Malleable + Evolving)
Job Type Characteristics
• Resource Allocation Type
 Static
 Dynamic
Who Decides
When it is decided
At job submission
(static allocation)
During job execution
(dynamic allocation)
User Rigid Evolving
Scheduler Moldable Malleable
Rigid Job
8
7
6
5
4
3
2
1
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
• Job Resources
 4 nodes
 60 minutes
Moldable Job
8
7
6
5
4
3
2
1
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
• Job Resources
 4 nodes/60 min
 2 nodes/120 min
 1 node/240 min
8
7
6
5
4
3
2
1
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
Malleable Job
• Job Resources
 1-8 nodes
 1000 node-min
8
7
6
5
4
3
2
1
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
C
B
A
D
Phases A, B, C
and D (each
phase 2x nodes
of previous
phase)
Evolving Job
• Job Resources
 1-8 nodes
 1000 node-min
Adaptive Job
8
7
6
5
4
3
2
1
C
B
A
D1
D2
Phases A, B, C and D
(each phase 2x nodes
of previous phase)
Scheduler takes 4 nodes
halfway through Phase D
and phase D2 takes 2x as
long as phase D1
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
• Job Resources
 1-8 nodes
 1000 node-min
Motivations
• New Programming Models
• New Algorithmic Techniques
• Unconventional Cluster Architectures
Adaptive Mesh Refinement
 Granularity
 Node Allocation
Coarse Medium Fine 
Few (1) Some (4) Many (64) 
Master/Slaves
Master
Slave Slave Slave Slave Slave Slave
Slave Slave Slave Slave Slave Slave
Slave Slave Slave Slave Slave Slave
Secondary Simulations
8
7
6
5
4
3
2
1
Sim2
Primary Simulation
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240
Sim2
Sim2 Sim2
Same Network Domain
• Cluster Booster
Many-core “Booster”
Multi-core “Cluster”
Multi-core jobs dynamically
burst out to parallel “booster”
nodes with accelerators
Unconventional Architectures
Apps, RTEs and Archs
Applications
• Astrophysics
• Brain simulation
• Climate simulation
• Flow solvers (QuadFlow)
• Hydrodynamics (Lulesh)
• Molecular Dynamics (NAMD)
• Water reservoir storage/flow
• Wave propagation (Wave2D)
RTEs
• Charm++
• OmpSs
• Uintah
• Radical-Pilot
Architectures
• EU DEEP/DEEP-ER
Application
Scheduler/Malleable Job Dialog
• Expand resource allocation
• Contract resource allocation
ApplicationScheduler
ApplicationScheduler Application
Expand (more)
Response
Contract (less)
Response
Scheduler/Evolving Job Dialog
• Grow resource allocation
• Shrink resource allocation
ApplicationApplicationScheduler
ApplicationScheduler Application
Grow (more)
Response
Shrink (less)
Response
Adaptive Job Race Condition
• Reason for naming convention
 Prevent ambiguity and confusion
ApplicationApplicationScheduler
Grow (more)
Grow (more)Expand (more)
?
Need for Standard API
• MPI: standard API for parallel communication
• Need standard API for application /
scheduler resource management dialogs
 Same API for applications
 Scheduler-specific API implementations
• Scheduler and Malleable/Evolvable
Application Dialog (SMEAD) API
 Make part of PMIx
 Need application use cases
Interested Parties
• Adaptive Computing (Moab scheduler, TORQUE RM, Nitro)
• Altair (PBS Pro scheduler/RM)
• Argonne National Laboratory (Cobalt scheduler)
• HLRS at University of Stuttgart
• Jülich Supercomputing Centre (DEEP-ER)
• Lawrence Livermore National Laboratory (Flux scheduler)
• Partec (ParaStation)
• SchedMD (Slurm scheduler/RM)
• TU Darmstadt Laboratory for Parallel Programming
• UK Atomic Weapons Establishment (AWE)
• University of Cambridge COSMOS
• University of Illinois at Urbana-Champaign Parallel
Programming Laboratory (Charm++ RTE)
• University of Utah SCI Institute (Uintah RTE)
URLs
• Need your help to design a standard API!
 Malleable/Evolving Application Use Case Survey
 http://goo.gl/forms/lq85y3SkV3 (Google Form)
• Info on adaptive job types and scheduling
 4-part blog about malleable / evolving / adaptive
jobs and schedulers
 http://www.adaptivecomputing.com/series/
malleable-and-evolving-jobs/
Agenda
• Overview
 Introductions
 Vision/objectives
• Performance status
• Integration/development status
• Roadmap
• Malleable/Evolving application support
• Wrap-Up/Open Forum
Bottom Line
We now have an interface library the RMs
will support for application-directed requests
Need to collaboratively define
what we want to do with it
For any programming model
MPI, OSHMEM, PGAS,…
Contribute or Follow Along!
 Project: https://pmix.github.io/master
 Code: https://github.com/pmix
Contributors/collaborators
are welcomed!
1 of 57

Recommended

HPC Controls Future by
HPC Controls FutureHPC Controls Future
HPC Controls Futurercastain
1K views74 slides
Exascale Process Management Interface by
Exascale Process Management InterfaceExascale Process Management Interface
Exascale Process Management Interfacercastain
1.3K views11 slides
SDN Architecture & Ecosystem by
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & EcosystemKingston Smiler
1.9K views75 slides
OpenDataPlane - Bill Fischofer by
OpenDataPlane - Bill FischoferOpenDataPlane - Bill Fischofer
OpenDataPlane - Bill Fischoferharryvanhaaren
599 views12 slides
Hyperscan - Mohammad Abdul Awal by
Hyperscan - Mohammad Abdul AwalHyperscan - Mohammad Abdul Awal
Hyperscan - Mohammad Abdul Awalharryvanhaaren
2.8K views10 slides
Open stack with_openflowsdn-torii by
Open stack with_openflowsdn-toriiOpen stack with_openflowsdn-torii
Open stack with_openflowsdn-toriiHui Cheng
4.2K views79 slides

More Related Content

What's hot

Microservice Powered Orchestration by
Microservice Powered OrchestrationMicroservice Powered Orchestration
Microservice Powered OrchestrationOpen Networking Summit
1.9K views23 slides
ODP Presentation LinuxCon NA 2014 by
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014Michael Christofferson
500 views25 slides
DPDK Architecture Musings - Andy Harvey by
DPDK Architecture Musings - Andy HarveyDPDK Architecture Musings - Andy Harvey
DPDK Architecture Musings - Andy Harveyharryvanhaaren
1.2K views7 slides
Play With Streams by
Play With StreamsPlay With Streams
Play With StreamsTianjian Chen
461 views78 slides
SDN Project PPT by
SDN Project PPTSDN Project PPT
SDN Project PPTMatthew Chang
3.4K views26 slides
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi... by
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENEWorkshop
419 views32 slides

What's hot(20)

DPDK Architecture Musings - Andy Harvey by harryvanhaaren
DPDK Architecture Musings - Andy HarveyDPDK Architecture Musings - Andy Harvey
DPDK Architecture Musings - Andy Harvey
harryvanhaaren1.2K views
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi... by SERENEWorkshop
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENEWorkshop419 views
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud by APNIC
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
APNIC385 views
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ... by Cloud Native Day Tel Aviv
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Developing Enterprise Applications for the Cloud, from Monolith to Microservice by Jack-Junjie Cai
Developing Enterprise Applications for the Cloud, from Monolith to MicroserviceDeveloping Enterprise Applications for the Cloud, from Monolith to Microservice
Developing Enterprise Applications for the Cloud, from Monolith to Microservice
Jack-Junjie Cai1.2K views
Apache NiFi SDLC Improvements by Bryan Bende
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
Bryan Bende2.3K views
L4-L7 services for SDN and NVF by Youcef Laribi by buildacloud
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribi
buildacloud2.1K views
Architecture of OpenFlow SDNs by US-Ignite
Architecture of OpenFlow SDNsArchitecture of OpenFlow SDNs
Architecture of OpenFlow SDNs
US-Ignite4.4K views
Generic Resource Manager - László Vadkerti, András Kovács by harryvanhaaren
Generic Resource Manager - László Vadkerti, András KovácsGeneric Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András Kovács
harryvanhaaren905 views
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ... by Brocade
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Brocade1.4K views
Hotplug and Virtio - Tetsuya Mukawa by harryvanhaaren
Hotplug and Virtio - Tetsuya MukawaHotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya Mukawa
harryvanhaaren713 views
LCA14: LCA14-209: ODP Project Update by Linaro
LCA14: LCA14-209: ODP Project UpdateLCA14: LCA14-209: ODP Project Update
LCA14: LCA14-209: ODP Project Update
Linaro1K views
Dynamic Service Chaining by Tail-f Systems
Dynamic Service Chaining Dynamic Service Chaining
Dynamic Service Chaining
Tail-f Systems13.2K views
DEVNET-1175 OpenDaylight Service Function Chaining by Cisco DevNet
DEVNET-1175	OpenDaylight Service Function ChainingDEVNET-1175	OpenDaylight Service Function Chaining
DEVNET-1175 OpenDaylight Service Function Chaining
Cisco DevNet2.7K views
Learn more about the tremendous value Open Data Plane brings to NFV by Ghodhbane Mohamed Amine
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV

Similar to SC15 PMIx Birds-of-a-Feather

PMIx Tiered Storage Support by
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
115 views22 slides
SC'16 PMIx BoF Presentation by
SC'16 PMIx BoF PresentationSC'16 PMIx BoF Presentation
SC'16 PMIx BoF Presentationrcastain
123 views34 slides
PMIx Updated Overview by
PMIx Updated OverviewPMIx Updated Overview
PMIx Updated OverviewRalph Castain
351 views34 slides
12 Factor App Methodology by
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodologylaeshin park
2.3K views42 slides
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind by
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
364 views29 slides
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows by
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
201 views36 slides

Similar to SC15 PMIx Birds-of-a-Feather(20)

PMIx Tiered Storage Support by rcastain
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
rcastain115 views
SC'16 PMIx BoF Presentation by rcastain
SC'16 PMIx BoF PresentationSC'16 PMIx BoF Presentation
SC'16 PMIx BoF Presentation
rcastain123 views
12 Factor App Methodology by laeshin park
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
laeshin park2.3K views
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind by Avere Systems
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems364 views
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows by Yong Feng
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng201 views
Adding Real-time Features to PHP Applications by Ronny López
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
Ronny López2.5K views
PMIx: Bridging the Container Boundary by rcastain
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundary
rcastain218 views
iSense Java Summit 2017 - Microservices in action at the Dutch National Police by Bert Jan Schrijver
iSense Java Summit 2017 - Microservices in action at the Dutch National PoliceiSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
Bert Jan Schrijver314 views
Change Management in Hybrid landscapes 2017 by Chris Kernaghan
Change Management in Hybrid landscapes 2017Change Management in Hybrid landscapes 2017
Change Management in Hybrid landscapes 2017
Chris Kernaghan366 views
Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017) by Lucas Jellema
Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017)Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017)
Event Bus as Backbone for Decoupled Microservice Choreography (JFall 2017)
Lucas Jellema2.6K views
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li by Ambassador Labs
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
Ambassador Labs247 views
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation... by Bert Jan Schrijver
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...
Bert Jan Schrijver411 views
Dublin JUG February 2018 - Microservices in action at the Dutch National Police by Bert Jan Schrijver
Dublin JUG February 2018 - Microservices in action at the Dutch National PoliceDublin JUG February 2018 - Microservices in action at the Dutch National Police
Dublin JUG February 2018 - Microservices in action at the Dutch National Police
Bert Jan Schrijver144 views
Get There meetup March 2018 - Microservices in action at the Dutch National P... by Bert Jan Schrijver
Get There meetup March 2018 - Microservices in action at the Dutch National P...Get There meetup March 2018 - Microservices in action at the Dutch National P...
Get There meetup March 2018 - Microservices in action at the Dutch National P...
Bert Jan Schrijver230 views
Devoxx PL 2018 - Microservices in action at the Dutch National Police by Bert Jan Schrijver
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceDevoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National Police
Bert Jan Schrijver141 views
Bitfusion Nimbix Dev Summit Heterogeneous Architectures by Subbu Rama
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Subbu Rama666 views
Realtime traffic analyser by Alex Moskvin
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
Alex Moskvin174 views
A Practical Guide to Selecting a Stream Processing Technology by confluent
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
confluent2K views

Recently uploaded

application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
6 views12 slides
CSF -SHEEBA.D presentation.pptx by
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptxSheebaD7
10 views13 slides
How to be(come) a successful PhD student by
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
422 views62 slides
Ethical issues associated with Genetically Modified Crops and Genetically Mod... by
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...PunithKumars6
18 views20 slides
Workshop Chemical Robotics ChemAI 231116.pptx by
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptxMarco Tibaldi
95 views41 slides
Experimental animal Guinea pigs.pptx by
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptxMansee Arya
10 views16 slides

Recently uploaded(20)

application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz6 views
CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD710 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens422 views
Ethical issues associated with Genetically Modified Crops and Genetically Mod... by PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars618 views
Workshop Chemical Robotics ChemAI 231116.pptx by Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi95 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya10 views
Light Pollution for LVIS students by CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew5 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH15 views
Connecting communities to promote FAIR resources: perspectives from an RDA / ... by Allyson Lister
Connecting communities to promote FAIR resources: perspectives from an RDA / ...Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Allyson Lister33 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople7 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy18 views
Guinea Pig as a Model for Translation Research by PervaizDar1
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation Research
PervaizDar111 views
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001117 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez121 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... by GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN14 views

SC15 PMIx Birds-of-a-Feather

  • 2. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration status • Roadmap • Malleable application support • Wrap-Up/Open Forum
  • 3. PMIx – PMI exascale Collaborative open source effort led by Intel, Mellanox Technologies, IBM, Adaptive Computing, and SchedMD. New collaborators are most welcome!
  • 4. Contributors • Intel  Ralph Castain  Annapurna Dasari • Mellanox  Joshua Ladd  Artem Polyakov  Elena Shipunova  Nadezhda Kogteva  Igor Ivanov • HP  David Linden  Andy Riebs • IBM  Dave Solt • Adaptive Computing  Gary Brown • RIST  Gilles Gouaillardet • SchedMD  David Bigagli • LANL  Nathan Hjelmn
  • 5. Motivation • Exascale launch times are a hot topic  Desire: reduce from many minutes to few seconds  Target: O(106) MPI processes on O(105) nodes thru MPI_Init in < 30 seconds • New programming models are exploding  Driven by need to efficiently exploit scale vs. resource constraints  Characterized by increased app-RM integration
  • 7. What Is Being Shared? • Job Info (~90%)  Names of participating nodes  Location and ID of procs  Relative ranks of procs (node, job)  Sizes (#procs in job, #procs on each node) • Endpoint info (~10%)  Contact info for each supported fabric Known to local RM daemon Can be computed for many fabrics
  • 8. Launch Initialization ExchangeMPIcontactinfo SetupMPIstructures barrier mpi_initcompletion barrier MPI_Init MPI_Finalize RRZ, 16-nodes, 8ppn, rank=0 Time(sec) Stage I Provide method for RM to share job info Work with fabric and library implementers to compute endpt from RM info
  • 11. Changing Needs • Notifications/response  Errors, resource changes  Negotiated response • Request allocation changes  shrink/expand • Workflow management  Steered/conditional execution • QoS requests  Power, file system, fabric Multiple, use- specific libs? (difficult for RM community to support) Single, multi- purpose lib?
  • 12. Objectives • Establish an independent, open community  Industry, academia, lab • Standalone client/server libraries  Ease adoption, enable broad/consistent support  Open source, non-copy-left  Transparent backward compatibility • Support evolving programming requirements • Enable “Instant On” support  Eliminate time-devouring steps  Provide faster, more scalable operations
  • 13. Today’s Goal • Inform the community • Solicit your input on the roadmap • Get you a little excited • Encourage participation
  • 14. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration/development status • Roadmap • Malleable/Evolving application support • Wrap-Up/Open Forum
  • 15. PMIx End Users • OSHMEM consumers  In Open MPI OSHMEM: 𝑠ℎ𝑚𝑒𝑚_𝑖𝑛𝑖𝑡 = 𝑚𝑝𝑖_𝑖𝑛𝑖𝑡 + 𝐶 • Job launch scales as MPI_Init.  Data driven communication patterns • Assume dense connectivity • MPI consumers  Large class of applications have spare connectivity • Ideal for an API that supports the direct modex concept
  • 16. What’s been done • Worked closely with customers, OEMs, and open source community to design a scalable API that addresses measured limitations of PMI2  Data driven design. • Led to the PMIx v1.0 API • Implementation and imminent release of PMIx v1.1  November 2015 release scheduled. • Significant architectural changes in Open MPI to support direct modex  “Add procs” in bulk MPI_Init  “Add proc” on-demand on first use outside MPI_init.  Available in the OMPI v2.x release Q1 2016. • Integrated PMIx into Open MPI v2.x  For native launching as well as direct launching under supported RMs.  For mpirun launched jobs, ORTE implements PMIx callbacks.  For srun launched jobs, SLURM implements PMIx callbacks in the PMIx plugin.  Client side framework added to OPAL with components for • Cray PMI • PMI1 • PMI2 • PMIx • backwards compatibility with PMI1 and PMI2. • Implemented and submitted upstream SLURM PMIx plugin  Currently available in SLURM Head  To be released in SLURM 16.05  Client side PMIx Framework and S1, S2, PMIxxx components in OPAL • PMIx unit tests integrated into Jenkins test harness
  • 17. srun --mpi=xxx hello_world 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500 3000 3500 4000 Time(sec.) MPI/OSHMEM Processes MPI_Init / Shmem_init PMIx async PMIx PMI2 Open MPI Trunk SLURM 16.05 prerelease with PMIx plugin PMIx v1.1 BTLs openib,self,vader Stage I-II
  • 18. srun --mpi=pmix ./hello_world 0 10 20 30 40 50 60 70 80 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 PMIx async projected performance Stage I-II
  • 19. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 Time(millisec) Nodes bin-true orte-no-op mpi hello world async modex Daemon wireup – linear scaling needs to be addressed mpirun/oshrun ./hello_world
  • 20. Conclusions • API is implemented and performing well in a variety of settings  Server integrated in OMPI for native launching and in SLURM as PMIx plugin for direct launching. • PMIx shows improvement over other state-of-the-art PMI2 implementations when doing a full modex  Data blobs versus encoded metakeys  Data scoping to reduce the modex size • PMIx supported direct modex significantly outperforms full modex operations for BTL/MTLs that can support this feature • Direct modex still scales as O(N) • Efforts and energy should be focused on daemon bootstrap problem • Instant-on capabilities could be used to further reduce deamon bootstrap time
  • 21. Next Steps • Leverage PMIx features • Reduce modex size with data scoping • Change MTL/PMLs to support direct modex • Investigate the impact of direct modex on densely connected applications • Continue to improve collective performance  Still need to have a scalable solution • Focus more efforts on the daemon bootstrap problem – this becomes the limiting factor moving to exascale  Leverage instant-on here as well
  • 22. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration/development status • Roadmap • Malleable application support • Wrap-Up/Open Forum
  • 23. Client Implementation Status • PMIx 1.1.1 released  Complete API definition • Future-proof API’s with Info array/length parameter for most calls • Blocking/non-blocking versions of most calls • Picked up by Fedora, others to come • PMIx MPI clients launched/tested with  ORTE (indirect) / ORCM (direct launch)  SLURM servers (direct launch)  IBM PMIx server (direct launch)
  • 24. Server Implementation Status • Server implementation time is greatly reduced through the PMIx convenience library  Handles all server/client interactions  Handles many PMIx requests that can be handled locally  Bundles many off-host requests • Optional  RMs free to implement their own
  • 25. Server Implementation Status • Moab  Integrated PMIx server in scheduler/launcher  Currently integrating PMIx effort with Moab  Scheduled for general availability: no time set • ORTE/ORCM  Full embedded PMIx reference server implementation  Scheduled for release with v2.0
  • 26. Server Implementation Status • SLURM  PMIx support for initial job launch/wireup currently developed & tested  Scheduled for GA: 16.05 release • IBM/LSF  CORAL • PMIx support for initial job launch/wireup currently developed & tested w/PM • Full PMIx support planned for CORAL  Integration to LSF to follow (TBD)
  • 27. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration/development status • Roadmap • Malleable/Evolving application support • Wrap-Up/Open Forum
  • 28. Scalability • Memory footprint  Distributed database for storing Key-Values • Memory cache, DHT, other models?  One instance of database per node • "zero-message" data access using shared-memory • Launch scaling  Enhanced support for collective operations • Provide pattern to host, host-provided send/recv functions, embedded inter-node comm?  Rely on HSN for launch, wireup support • While app is quiescent, then return to OOB
  • 29. Flexible Allocation Support • Request additional resources  Compute, memory, network, NVM, burst buffer  Immediate, forecast  Expand existing allocation, separate allocation • Return extra resources  No longer required  Will not be used for some specified time, reclaim (handshake) when ready to use • Notification of preemption  Provide opportunity to cleanly pause
  • 30. I/O Support • Asynchronous operations  Anticipatory data fetch, staging  Advise time to complete  Notify upon available • Storage policy requests  Hot/warm/cold data movement  Desired locations and striping/replication patterns  Persistence of files, shared memory regions across jobs, sessions  ACL to generated data across jobs, sessions
  • 31. Spawn Support • Staging support  Files, libraries required by new apps  Allow RM to consider in scheduler • Current location of data • Time to retrieve and position • Schedule scalable preload • Provisioning requests  Allow RM to consider in selecting resources, minimize startup time due to provisioning  Desired image, packages
  • 32. Network Integration • Quality of service requests  Bandwidth, traffic priority, power constraints  Multi-fabric failover, striping prioritization  Security requirements • Network domain definitions, ACLs • Notification requests  State-of-health  Update process endpoint upon fault recovery • Topology information  Torus, dragonfly, …
  • 33. Power Control/Management • Application requests  Advise of changing workload requirements  Request changes in policy  Specify desired policy for spawned applications  Transfer allocated power to specifed job • RM notifications  Need to change power policy • Allow application to accept, request pause  Preemption notification • Provide backward compatibility with PowerAPI
  • 34. PMIx: Fault Tolerance • Notification  App can register for error notifications, incipient faults • RM will notify when app would be impacted • App responds with desired action  Terminate/restart job, wait for checkpoint, etc. • RM/app negotiate final response  App can notify RM of errors • RM will notify specified, registered procs • Restart support  Specify source (remote NVM checkpoint, global filesystem, etc)  Location hints/requests  Entire job, specific processes
  • 35. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration/development status • Roadmap • Malleable/Evolving application support • Wrap-Up/Open Forum
  • 36. Job Types • Rigid • Moldable • Malleable • Evolving • Adaptive (Malleable + Evolving)
  • 37. Job Type Characteristics • Resource Allocation Type  Static  Dynamic Who Decides When it is decided At job submission (static allocation) During job execution (dynamic allocation) User Rigid Evolving Scheduler Moldable Malleable
  • 38. Rigid Job 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 • Job Resources  4 nodes  60 minutes
  • 39. Moldable Job 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 • Job Resources  4 nodes/60 min  2 nodes/120 min  1 node/240 min
  • 40. 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 Malleable Job • Job Resources  1-8 nodes  1000 node-min
  • 41. 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 C B A D Phases A, B, C and D (each phase 2x nodes of previous phase) Evolving Job • Job Resources  1-8 nodes  1000 node-min
  • 42. Adaptive Job 8 7 6 5 4 3 2 1 C B A D1 D2 Phases A, B, C and D (each phase 2x nodes of previous phase) Scheduler takes 4 nodes halfway through Phase D and phase D2 takes 2x as long as phase D1 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 • Job Resources  1-8 nodes  1000 node-min
  • 43. Motivations • New Programming Models • New Algorithmic Techniques • Unconventional Cluster Architectures
  • 44. Adaptive Mesh Refinement  Granularity  Node Allocation Coarse Medium Fine  Few (1) Some (4) Many (64) 
  • 45. Master/Slaves Master Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave
  • 46. Secondary Simulations 8 7 6 5 4 3 2 1 Sim2 Primary Simulation 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 Sim2 Sim2 Sim2
  • 47. Same Network Domain • Cluster Booster Many-core “Booster” Multi-core “Cluster” Multi-core jobs dynamically burst out to parallel “booster” nodes with accelerators Unconventional Architectures
  • 48. Apps, RTEs and Archs Applications • Astrophysics • Brain simulation • Climate simulation • Flow solvers (QuadFlow) • Hydrodynamics (Lulesh) • Molecular Dynamics (NAMD) • Water reservoir storage/flow • Wave propagation (Wave2D) RTEs • Charm++ • OmpSs • Uintah • Radical-Pilot Architectures • EU DEEP/DEEP-ER
  • 49. Application Scheduler/Malleable Job Dialog • Expand resource allocation • Contract resource allocation ApplicationScheduler ApplicationScheduler Application Expand (more) Response Contract (less) Response
  • 50. Scheduler/Evolving Job Dialog • Grow resource allocation • Shrink resource allocation ApplicationApplicationScheduler ApplicationScheduler Application Grow (more) Response Shrink (less) Response
  • 51. Adaptive Job Race Condition • Reason for naming convention  Prevent ambiguity and confusion ApplicationApplicationScheduler Grow (more) Grow (more)Expand (more) ?
  • 52. Need for Standard API • MPI: standard API for parallel communication • Need standard API for application / scheduler resource management dialogs  Same API for applications  Scheduler-specific API implementations • Scheduler and Malleable/Evolvable Application Dialog (SMEAD) API  Make part of PMIx  Need application use cases
  • 53. Interested Parties • Adaptive Computing (Moab scheduler, TORQUE RM, Nitro) • Altair (PBS Pro scheduler/RM) • Argonne National Laboratory (Cobalt scheduler) • HLRS at University of Stuttgart • Jülich Supercomputing Centre (DEEP-ER) • Lawrence Livermore National Laboratory (Flux scheduler) • Partec (ParaStation) • SchedMD (Slurm scheduler/RM) • TU Darmstadt Laboratory for Parallel Programming • UK Atomic Weapons Establishment (AWE) • University of Cambridge COSMOS • University of Illinois at Urbana-Champaign Parallel Programming Laboratory (Charm++ RTE) • University of Utah SCI Institute (Uintah RTE)
  • 54. URLs • Need your help to design a standard API!  Malleable/Evolving Application Use Case Survey  http://goo.gl/forms/lq85y3SkV3 (Google Form) • Info on adaptive job types and scheduling  4-part blog about malleable / evolving / adaptive jobs and schedulers  http://www.adaptivecomputing.com/series/ malleable-and-evolving-jobs/
  • 55. Agenda • Overview  Introductions  Vision/objectives • Performance status • Integration/development status • Roadmap • Malleable/Evolving application support • Wrap-Up/Open Forum
  • 56. Bottom Line We now have an interface library the RMs will support for application-directed requests Need to collaboratively define what we want to do with it For any programming model MPI, OSHMEM, PGAS,…
  • 57. Contribute or Follow Along!  Project: https://pmix.github.io/master  Code: https://github.com/pmix Contributors/collaborators are welcomed!