SC'16 PMIx BoF Presentation

R
Process Management
Interface – Exascale
Agenda
• Since we last met…
 Address some common questions
 Outline PMIx standards process
• PMIx v1.x release series
 What has been included and planned
 Review launch performance status
• Roadmap
 Features in the pipeline
 Potential future features
• Questions/comments/discussion
Charter?
• Define
 set of agnostic APIs (not affiliated with specific model code base) to
support application  system mgmt software (SMS) interactions
• Develop
 an open source (non-copy-left licensed) standalone “convenience” library
to facilitate adoption
• Retain
 transparent compatibility across all PMI/PMIx versions
• Support
 the Instant On initiative
• Work
 to define/implement new APIs for evolving programming models.
What Is PMIx?
• Standardized APIs
 Four header files (client, server, common, tool)
 Enable portability across environments
 Support interactions between applications and
system management stack
• Convenience library
 Facilitate adoption
 Serves as validation platform for standard
• Community
What Is PMIx?
• Standardized APIs
 Four header files (client, server, common, tool)
 Enable portability across environments
 Support interactions between applications and
system management stack
• Convenience library
 Facilitate adoption
 Serves as validation platform for standard
• Community
Required: Caveat
• Containerized operations
 Require cross-boundary compatibility
 Wireup library of containerized app must be
compatible with the local resource manager
• Issues occur when migrating
 Build under one environment using custom
implementation
 Move to another environment using different
implementation
• Convenience library mitigates the problem
Why Not Part of MPI Forum?
• PMIx is agnostic
 No concept of “communicator”
 No understanding of MPI
 Used by non-MPI libraries
• Discussions underway
 Bring it into Forum process in some
appropriate fashion
PMIx “Standards” Process
• Modifications/additions
 Proposed as RFC
 Include prototype implementation
• Pull request to convenience library
 Notification sent to mailing list
• Reviews conducted
 RFC and implementation
 Continues until consensus emerges
• Approval given
 Developer telecon (2x/week)
PMIx Numbering
Major . Minor . Release
Version of Standard
Track
convenience
library revisions
Regression Testing?
• Limited direct capability
 Run basic API tests on each PR
• Extensive embedded testing
 Open MPI includes PMIx master, regularly
updated
 20k+ tests run every night
• Tests all spawn, wireup, publish/lookup,
connect/disconnect APIs
• Not 100% code coverage
Adoption?
• Already released
 SLURM 16.05 (PMIx v1.1.5)
• Planned
 IBM, Fujitsu, Adaptive Solutions, Altair, Microsoft
• Reference server
 Provides surrogate support until native support
becomes available
 Supports full PMIx standard, limited by RM capabilities
 Launches network of PMIx servers across allocation
Agenda
• Since we last met…
 Address some common questions
 Outline PMIx standards process
• PMIx v1.x release series (Artem Polyakov, Mellanox)
 What has been included and planned
 Review launch performance status
• Roadmap
 Features in the pipeline
 Potential future features
• Questions/comments/discussion
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35
MPI_Init (sec)
nodes
key exchange type:
collective direct-fetch direct-fetch/async
PMIx/UCX job-start use case
Hardware:
• 32 nodes
• 2 processors
(Xeon E5-2680 v2)
• 20 cores per node
• 1 proc per core
• Open MPI v2.1 (modified to enable ability
to avoid the barrier at the end of MPI_Init)
• PMIx v1.1.5
• UCX (f3f9ad7)
*
* direct-fetch/async assumes no synchronization barrier inside MPI_Init.
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35
MPI_Init (sec)
nodes
key exchange type:
collective direct-fetch direct-fetch/async
PMIx/UCX job-start usecase
“allgatherv” on all
submitted keys
Synchronization
overhead
v1.2.0
• Extension of v1.1.5
 v1.1.5
• Each proc stores own copy of data
 v1.2
• Data stored in shared memory owned by PMIx
server
• Each proc has read-only access
• Benefits
 Minimizes memory footprint
 Faster launch times
Shared memory data storage
(architecture)
RTE
daemon
MPI
process
PMIx-client
PMIx-server
MPI
process
PMIx-client
PMIx
Shared mem
(store blobs)
usock
usock
• Server provides all the data through the shared memory
• Each process can fetch all the data with 0 server-side CPU cycles!
• In the case of direct key fetching if a key is not found in the shared
memory – a process will request it from the server using regular
messaging mechanism.
Shared memory data storage
(architecture)
RTE
daemon
MPI
process
PMIx-client
PMIx-server
MPI
process
PMIx-client
PMIx
Shared mem
(store blobs)
usock
usock
• Server provides all the data through the shared memory
• Each process can fetch all the data with 0 server-side CPU cycles!
• In the case of direct key fetching if a key is not found in the shared
memory – a process will request it from the server using regular
messaging mechanism.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 100 200 300 400 500 600 700
PMIx_Get(all), sec
processes
Messages
Shmem
Shared memory data storage
(synthetic performance test)
Hardware:
• 32 nodes
• 2 processors
(Intel Xeon E5-2680 v2)
• 20 cores per node
• 1 proc per core
• Open MPI v2.1
• PMIx v1.2
https://github.com/pmix/master/tree/master/contrib/perf_tools
• 10 keys per process
• 100-element arrays of
integers
Shared memory data storage
(synthetic performance test) [2]
Nodes procs
Messages (us) Shmem (us)
local key remote key local key remote key
1 20 8.8 6.4
2 40 8.9 9.2 6.5 6.5
4 80 8.8 9.2 6.5 6.5
8 160 8.7 9.2 6.4 6.4
16 320 8.4 9.2 6.5 6.5
32 640 8.2 9.2 6.5 6.5
Advantages:
• Stable timings for a separate key access(no difference
between local and remote key access)
• Up to 30% improvement for the remote key fetch
• Significant CPU offload on the SMP systems with large
core count.
Shared memory data storage
(synthetic performance test) [3]
0.20
0.07 0.07
0.00
0.05
0.10
0.15
0.20
0.25
PMI2 PMIx msg PMIx shmem
ms
CL1: avg(Put)
0.22
0.08 0.07
0.00
0.05
0.10
0.15
0.20
0.25
PMI2 PMIx msg PMIx shmem
ms
CL1: avg(Get)
CL1 Hardware:
• 15 nodes
• 2 processors
(Intel Xeon X5570)
• 8 cores per node
• 1 proc per core
CL2 Hardware:
• 64 nodes
• 2 processors
(Intel Xeon E5-2697 v3)
• 28 cores per node
• 1 proc per core
0.37
0.06 0.06
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
PMI2 PMIx msg PMIx shmem
ms CL2: avg(Put)
0.32
0.11
0.06
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
PMI2 PMIx msg PMIx shmem
ms
CL2: avg(Get)
PMIx Roadmap
2014
1/2016
1.1.3
11/2016
1.2.0
6/2016
1.1.4
RM Production Releases
Bug fixes
8/2016
1.1.5
Bug fixes Shared
memory
datastore
In Pipeline
David Solt
IBM
Tool Support
• Tool connection support
 Allow tools to connect to local PMIx server
 Specify system vs application
System
PMIx server
Mpirun/
orted
Tool
RM
P
Tool Support
• Query
 Network topology
• Array of proc network-relative locations
• Overall topology (e.g., “dragonfly”)
 Running jobs
• Currently executing job namespaces
• Array of proc location, status, PID
 Resources
• Available system resources
• Array of proc location, resource utilization (ala “top”)
 Queue status
• Current scheduler queue backlog
Examples
Debuggers?
New Flexibility
• Plugin architecture
 DLL-based system
 Supports proprietary binary components
 Allows multiple implementations of common
functionality
• Buffer pack/unpack operations
• Communications (TCP, shared memory,…)
• Security
Obsolescence Protection
• Plugin architecture
• Cross-version support
 Automatic detection of client/server version
 Properly adjust for changes in structures, protocols
 Ensure clients always get what they can understand
 Backward support to the v1.1.5 level
Notification
• Plugin architecture
• Cross-version support
• Event notification
 System generated, app generated
 Resolves issues in original API, implementation
 Register for broad range of events
• Constrained by availability of backend support
Logging
• Plugin architecture
• Cross-version support
• Event notification
• Log data
 Store desired data in system data store(s)
• Specify hot/warm/cold, local/remote, database and type of
database, …
 Log output to stdout/err
 Supports binary and non-binary data
• Heterogeneity taken care of for you
PMIx Roadmap
2014
1/2016
1.1.3
11/2016
1.2.0
6/2016
1.1.4
RM Production Releases
Bug fixes
8/2016
1.1.5
Bug fixes Shared
memory
datastore
Future
Ralph Castain
Intel
Future Features
Reference Server
• Initial version: DVM
 Interconnected PMIx servers
 High-speed, resilient collectives
• bcast, allgather/barrier
• Future updates: ”fill” mode
 Servers proxy clients to host RM
 Complete missing host functionality
Winter 2017
Future Features
Debugger Support
• Ongoing discussions with MPI Forum Tools WG
 Implement proposed MPIR2 interface
 Enhance scalability
• Exploit tool connection
 Obtain proctable info
 Use PMIx_Spawn to launch daemons, auto-wireup, localize
proctable retrieval
• Extend available supporting info
 Network topology, bandwidth utilization
 Event notification
Winter 2017
Future Features
Network Support Framework
• Interface to 3rd party libraries
• Enable support for network features
 Precondition of network security keys
 Retrieval of endpoint assignments, topology
• Data made available
 In initial job info returned at proc start
 Retrieved by Query
Spring 2017
Future Features
IO Support
• Reduce launch time
 Current practices
• Reactive cache/forward
• Static builds
 Proactive pre-positioning
• Examine provided job/script
• Return array of binaries and libraries required for execution
• Enhance execution
 Request async file positioning
• Callback when ready
 Specify persistence options
Summer 2017
Future Features
Generalized Data Store (GDS)
• Abstracted view of data store
 Multiple plugins for different implementations
• Local (hot) storage
• Distributed (warm) models
• Database (cold) storage
• Explore alternative paradigms
 Job info, wireup data
 Publish/lookup
 Log
Fall 2017
Open Discussion
We now have an interface library the RMs
will support for application-directed requests
Need to collaboratively define
what we want to do with it
Project: https://pmix.github.io/master
Code: https://github.com/pmix
1 of 34

Recommended

Embedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi AP by
Embedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi APEmbedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi AP
Embedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi APAhmed El-Arabawy
2.3K views75 slides
Ch 2: TCP/IP Concepts Review by
Ch 2: TCP/IP Concepts ReviewCh 2: TCP/IP Concepts Review
Ch 2: TCP/IP Concepts ReviewSam Bowne
3.9K views41 slides
Software Defined Networking: Network Virtualization by
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationNetCraftsmen
1.9K views45 slides
Cloud Architecture by
Cloud ArchitectureCloud Architecture
Cloud ArchitectureRichard Harvey
554 views10 slides
Raga_SDN_NSX_1 by
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1Ranjith Kumar
144 views40 slides
2017 - LISA - LinkedIn's Distributed Firewall (DFW) by
2017 - LISA - LinkedIn's Distributed Firewall (DFW)2017 - LISA - LinkedIn's Distributed Firewall (DFW)
2017 - LISA - LinkedIn's Distributed Firewall (DFW)Mike Svoboda
9.2K views43 slides

More Related Content

What's hot

OpenDataPlane - Bill Fischofer by
OpenDataPlane - Bill FischoferOpenDataPlane - Bill Fischofer
OpenDataPlane - Bill Fischoferharryvanhaaren
599 views12 slides
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN by
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
2.2K views20 slides
KVM Enhancements for OPNFV by
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFVOPNFV
902 views20 slides
Fastsocket Linxiaofeng by
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
2K views51 slides
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund by
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundSDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundChef Software, Inc.
4K views13 slides
What's new in Neutron Juno by
What's new in Neutron JunoWhat's new in Neutron Juno
What's new in Neutron JunoJaume Devesa Gomez
1.1K views31 slides

What's hot(20)

OpenDataPlane - Bill Fischofer by harryvanhaaren
OpenDataPlane - Bill FischoferOpenDataPlane - Bill Fischofer
OpenDataPlane - Bill Fischofer
harryvanhaaren599 views
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN by nvirters
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
nvirters2.2K views
KVM Enhancements for OPNFV by OPNFV
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFV
OPNFV902 views
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund by Chef Software, Inc.
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundSDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
Accelerated dataplanes integration and deployment by OPNFV
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
OPNFV406 views
IBM MQ High Availabillity and Disaster Recovery (2017 version) by MarkTaylorIBM
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
MarkTaylorIBM2K views
Content Addressable NDN Repository - checkpoint by Shi Junxiao
Content Addressable NDN Repository - checkpointContent Addressable NDN Repository - checkpoint
Content Addressable NDN Repository - checkpoint
Shi Junxiao784 views
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io by OPNFV
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV318 views
OSMC 2021 | Monitoring Open Source Hardware by NETWAYS
OSMC 2021 | Monitoring Open Source HardwareOSMC 2021 | Monitoring Open Source Hardware
OSMC 2021 | Monitoring Open Source Hardware
NETWAYS79 views
StreamSleuth 100 GbE Network Packet Processing Appliance by Marcus Weddle
StreamSleuth 100 GbE Network Packet Processing ApplianceStreamSleuth 100 GbE Network Packet Processing Appliance
StreamSleuth 100 GbE Network Packet Processing Appliance
Marcus Weddle195 views
Hotplug and Virtio - Tetsuya Mukawa by harryvanhaaren
Hotplug and Virtio - Tetsuya MukawaHotplug and Virtio - Tetsuya Mukawa
Hotplug and Virtio - Tetsuya Mukawa
harryvanhaaren713 views
Log and control all service-to-service traffic in one place (Kelvin Wong) by London Microservices
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)
Balázs Bucsay - XFLTReaT: Building a Tunnel by hacktivity
Balázs Bucsay - XFLTReaT: Building a TunnelBalázs Bucsay - XFLTReaT: Building a Tunnel
Balázs Bucsay - XFLTReaT: Building a Tunnel
hacktivity189 views

Similar to SC'16 PMIx BoF Presentation

PMIx Tiered Storage Support by
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
115 views22 slides
SC15 PMIx Birds-of-a-Feather by
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Featherrcastain
1.2K views57 slides
Realtime traffic analyser by
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
174 views56 slides
WSO2 Microservices Framework for Java - Product Overview by
WSO2 Microservices Framework for Java - Product OverviewWSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product OverviewWSO2
551 views34 slides
PMIx: Bridging the Container Boundary by
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundaryrcastain
218 views23 slides
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa by
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaVietnam Open Infrastructure User Group
143 views28 slides

Similar to SC'16 PMIx BoF Presentation(20)

PMIx Tiered Storage Support by rcastain
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
rcastain115 views
SC15 PMIx Birds-of-a-Feather by rcastain
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Feather
rcastain1.2K views
Realtime traffic analyser by Alex Moskvin
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
Alex Moskvin174 views
WSO2 Microservices Framework for Java - Product Overview by WSO2
WSO2 Microservices Framework for Java - Product OverviewWSO2 Microservices Framework for Java - Product Overview
WSO2 Microservices Framework for Java - Product Overview
WSO2551 views
PMIx: Bridging the Container Boundary by rcastain
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundary
rcastain218 views
SC'17 BoF Presentation by rcastain
SC'17 BoF PresentationSC'17 BoF Presentation
SC'17 BoF Presentation
rcastain34 views
FME World Tour 2016: FME and continuous integration by GIM_nv
FME World Tour 2016: FME and continuous integrationFME World Tour 2016: FME and continuous integration
FME World Tour 2016: FME and continuous integration
GIM_nv580 views
Data Stream Processing with Apache Flink by Fabian Hueske
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske4.3K views
Adding Real-time Features to PHP Applications by Ronny López
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
Ronny López2.5K views
From Device to Data Center to Insights: Architectural Considerations for the ... by P. Taylor Goetz
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
P. Taylor Goetz324 views
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex by Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex828 views
MPLAB® Harmony Ecosystem by Design World
MPLAB® Harmony EcosystemMPLAB® Harmony Ecosystem
MPLAB® Harmony Ecosystem
Design World1.4K views
EuroMPI 2017 PMIx presentation by rcastain
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentation
rcastain133 views
OpenPOWER Acceleration of HPCC Systems by HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems721 views

Recently uploaded

What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueShapeBlue
89 views23 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
46 views28 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
126 views32 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
61 views38 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
48 views69 slides
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
43 views35 slides

Recently uploaded(20)

What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue89 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue46 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker48 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue66 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely29 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman38 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue25 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue28 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu28 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi139 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue44 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue70 views

SC'16 PMIx BoF Presentation

  • 2. Agenda • Since we last met…  Address some common questions  Outline PMIx standards process • PMIx v1.x release series  What has been included and planned  Review launch performance status • Roadmap  Features in the pipeline  Potential future features • Questions/comments/discussion
  • 3. Charter? • Define  set of agnostic APIs (not affiliated with specific model code base) to support application  system mgmt software (SMS) interactions • Develop  an open source (non-copy-left licensed) standalone “convenience” library to facilitate adoption • Retain  transparent compatibility across all PMI/PMIx versions • Support  the Instant On initiative • Work  to define/implement new APIs for evolving programming models.
  • 4. What Is PMIx? • Standardized APIs  Four header files (client, server, common, tool)  Enable portability across environments  Support interactions between applications and system management stack • Convenience library  Facilitate adoption  Serves as validation platform for standard • Community
  • 5. What Is PMIx? • Standardized APIs  Four header files (client, server, common, tool)  Enable portability across environments  Support interactions between applications and system management stack • Convenience library  Facilitate adoption  Serves as validation platform for standard • Community
  • 6. Required: Caveat • Containerized operations  Require cross-boundary compatibility  Wireup library of containerized app must be compatible with the local resource manager • Issues occur when migrating  Build under one environment using custom implementation  Move to another environment using different implementation • Convenience library mitigates the problem
  • 7. Why Not Part of MPI Forum? • PMIx is agnostic  No concept of “communicator”  No understanding of MPI  Used by non-MPI libraries • Discussions underway  Bring it into Forum process in some appropriate fashion
  • 8. PMIx “Standards” Process • Modifications/additions  Proposed as RFC  Include prototype implementation • Pull request to convenience library  Notification sent to mailing list • Reviews conducted  RFC and implementation  Continues until consensus emerges • Approval given  Developer telecon (2x/week)
  • 9. PMIx Numbering Major . Minor . Release Version of Standard Track convenience library revisions
  • 10. Regression Testing? • Limited direct capability  Run basic API tests on each PR • Extensive embedded testing  Open MPI includes PMIx master, regularly updated  20k+ tests run every night • Tests all spawn, wireup, publish/lookup, connect/disconnect APIs • Not 100% code coverage
  • 11. Adoption? • Already released  SLURM 16.05 (PMIx v1.1.5) • Planned  IBM, Fujitsu, Adaptive Solutions, Altair, Microsoft • Reference server  Provides surrogate support until native support becomes available  Supports full PMIx standard, limited by RM capabilities  Launches network of PMIx servers across allocation
  • 12. Agenda • Since we last met…  Address some common questions  Outline PMIx standards process • PMIx v1.x release series (Artem Polyakov, Mellanox)  What has been included and planned  Review launch performance status • Roadmap  Features in the pipeline  Potential future features • Questions/comments/discussion
  • 13. 0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25 30 35 MPI_Init (sec) nodes key exchange type: collective direct-fetch direct-fetch/async PMIx/UCX job-start use case Hardware: • 32 nodes • 2 processors (Xeon E5-2680 v2) • 20 cores per node • 1 proc per core • Open MPI v2.1 (modified to enable ability to avoid the barrier at the end of MPI_Init) • PMIx v1.1.5 • UCX (f3f9ad7) * * direct-fetch/async assumes no synchronization barrier inside MPI_Init.
  • 14. 0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25 30 35 MPI_Init (sec) nodes key exchange type: collective direct-fetch direct-fetch/async PMIx/UCX job-start usecase “allgatherv” on all submitted keys Synchronization overhead
  • 15. v1.2.0 • Extension of v1.1.5  v1.1.5 • Each proc stores own copy of data  v1.2 • Data stored in shared memory owned by PMIx server • Each proc has read-only access • Benefits  Minimizes memory footprint  Faster launch times
  • 16. Shared memory data storage (architecture) RTE daemon MPI process PMIx-client PMIx-server MPI process PMIx-client PMIx Shared mem (store blobs) usock usock • Server provides all the data through the shared memory • Each process can fetch all the data with 0 server-side CPU cycles! • In the case of direct key fetching if a key is not found in the shared memory – a process will request it from the server using regular messaging mechanism.
  • 17. Shared memory data storage (architecture) RTE daemon MPI process PMIx-client PMIx-server MPI process PMIx-client PMIx Shared mem (store blobs) usock usock • Server provides all the data through the shared memory • Each process can fetch all the data with 0 server-side CPU cycles! • In the case of direct key fetching if a key is not found in the shared memory – a process will request it from the server using regular messaging mechanism.
  • 18. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 100 200 300 400 500 600 700 PMIx_Get(all), sec processes Messages Shmem Shared memory data storage (synthetic performance test) Hardware: • 32 nodes • 2 processors (Intel Xeon E5-2680 v2) • 20 cores per node • 1 proc per core • Open MPI v2.1 • PMIx v1.2 https://github.com/pmix/master/tree/master/contrib/perf_tools • 10 keys per process • 100-element arrays of integers
  • 19. Shared memory data storage (synthetic performance test) [2] Nodes procs Messages (us) Shmem (us) local key remote key local key remote key 1 20 8.8 6.4 2 40 8.9 9.2 6.5 6.5 4 80 8.8 9.2 6.5 6.5 8 160 8.7 9.2 6.4 6.4 16 320 8.4 9.2 6.5 6.5 32 640 8.2 9.2 6.5 6.5 Advantages: • Stable timings for a separate key access(no difference between local and remote key access) • Up to 30% improvement for the remote key fetch • Significant CPU offload on the SMP systems with large core count.
  • 20. Shared memory data storage (synthetic performance test) [3] 0.20 0.07 0.07 0.00 0.05 0.10 0.15 0.20 0.25 PMI2 PMIx msg PMIx shmem ms CL1: avg(Put) 0.22 0.08 0.07 0.00 0.05 0.10 0.15 0.20 0.25 PMI2 PMIx msg PMIx shmem ms CL1: avg(Get) CL1 Hardware: • 15 nodes • 2 processors (Intel Xeon X5570) • 8 cores per node • 1 proc per core CL2 Hardware: • 64 nodes • 2 processors (Intel Xeon E5-2697 v3) • 28 cores per node • 1 proc per core 0.37 0.06 0.06 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 PMI2 PMIx msg PMIx shmem ms CL2: avg(Put) 0.32 0.11 0.06 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 PMI2 PMIx msg PMIx shmem ms CL2: avg(Get)
  • 21. PMIx Roadmap 2014 1/2016 1.1.3 11/2016 1.2.0 6/2016 1.1.4 RM Production Releases Bug fixes 8/2016 1.1.5 Bug fixes Shared memory datastore In Pipeline David Solt IBM
  • 22. Tool Support • Tool connection support  Allow tools to connect to local PMIx server  Specify system vs application System PMIx server Mpirun/ orted Tool RM P
  • 23. Tool Support • Query  Network topology • Array of proc network-relative locations • Overall topology (e.g., “dragonfly”)  Running jobs • Currently executing job namespaces • Array of proc location, status, PID  Resources • Available system resources • Array of proc location, resource utilization (ala “top”)  Queue status • Current scheduler queue backlog Examples Debuggers?
  • 24. New Flexibility • Plugin architecture  DLL-based system  Supports proprietary binary components  Allows multiple implementations of common functionality • Buffer pack/unpack operations • Communications (TCP, shared memory,…) • Security
  • 25. Obsolescence Protection • Plugin architecture • Cross-version support  Automatic detection of client/server version  Properly adjust for changes in structures, protocols  Ensure clients always get what they can understand  Backward support to the v1.1.5 level
  • 26. Notification • Plugin architecture • Cross-version support • Event notification  System generated, app generated  Resolves issues in original API, implementation  Register for broad range of events • Constrained by availability of backend support
  • 27. Logging • Plugin architecture • Cross-version support • Event notification • Log data  Store desired data in system data store(s) • Specify hot/warm/cold, local/remote, database and type of database, …  Log output to stdout/err  Supports binary and non-binary data • Heterogeneity taken care of for you
  • 28. PMIx Roadmap 2014 1/2016 1.1.3 11/2016 1.2.0 6/2016 1.1.4 RM Production Releases Bug fixes 8/2016 1.1.5 Bug fixes Shared memory datastore Future Ralph Castain Intel
  • 29. Future Features Reference Server • Initial version: DVM  Interconnected PMIx servers  High-speed, resilient collectives • bcast, allgather/barrier • Future updates: ”fill” mode  Servers proxy clients to host RM  Complete missing host functionality Winter 2017
  • 30. Future Features Debugger Support • Ongoing discussions with MPI Forum Tools WG  Implement proposed MPIR2 interface  Enhance scalability • Exploit tool connection  Obtain proctable info  Use PMIx_Spawn to launch daemons, auto-wireup, localize proctable retrieval • Extend available supporting info  Network topology, bandwidth utilization  Event notification Winter 2017
  • 31. Future Features Network Support Framework • Interface to 3rd party libraries • Enable support for network features  Precondition of network security keys  Retrieval of endpoint assignments, topology • Data made available  In initial job info returned at proc start  Retrieved by Query Spring 2017
  • 32. Future Features IO Support • Reduce launch time  Current practices • Reactive cache/forward • Static builds  Proactive pre-positioning • Examine provided job/script • Return array of binaries and libraries required for execution • Enhance execution  Request async file positioning • Callback when ready  Specify persistence options Summer 2017
  • 33. Future Features Generalized Data Store (GDS) • Abstracted view of data store  Multiple plugins for different implementations • Local (hot) storage • Distributed (warm) models • Database (cold) storage • Explore alternative paradigms  Job info, wireup data  Publish/lookup  Log Fall 2017
  • 34. Open Discussion We now have an interface library the RMs will support for application-directed requests Need to collaboratively define what we want to do with it Project: https://pmix.github.io/master Code: https://github.com/pmix