SlideShare a Scribd company logo
1 of 13
PMIx: Process Management for
Exascale Environments
Agenda
• State of the Community
 Ralph H. Castain (Intel)
• Scaled Performance
 Aurelien Bouteiller (UTK)
• PMIx Standards Document
 Josh Hursey (IBM)
• Q&A
Where Are We?
• Launch scaling
 Wireup enhancements complete
 Fabric “instant on” enablement underway
• Results
 Tracks spawn propagation time
 Exascale in < 5 seconds
 3rd party confirmation
PMIx Launch Sequence
*RM daemon, mpirun-daemon, etc.
PMIx Launch Sequence
*RM daemon, mpirun-daemon, etc.
Current Support (I)
• Typical startup operations
 Put, get, commit, barrier,
spawn, [dis]connect,
publish/lookup
• Tool connections
 Debugger, job submission,
query
• Generalized query
support
 Job status, layout, system
data, resource availability
• Event notification
 App, system generated
 Subscribe, chained
 Preemption, failures,
timeout warning, …
• Logging
 Status reports, error output
• Flexible allocations
 Release resources, request
resources
Current Support (II)
• Network support
 Security keys, pre-
spawn local driver
setup
• Obsolescence
protection
 Automatic cross-
version compatibility
 Container support
• Job control
 Pause, kill, signal,
heartbeat, resilience
support (C/R
coordination)
• Async definition of
process groups
 Rolling
startup/teardown
In Pipeline
• Network support
 Fabric topology and status,
traffic reports, fabric manager
interaction
• MPI Sessions support
 Rolling startup/teardown
• Generalized data store
 Distributed key-value storage
• Security
 Obtain and validate credentials
for application/SMS
• File system support
 Dependency detection
 Tiered storage caching strategies
• Debugger/tool support++
 Automatic rendezvous
 Single interface to all launchers
 Co-launch daemons
 Access fabric info, etc.
• Cross-library interoperation
 OpenMP/MPI coordination
Nearing completion Ramping up Idling
Three Distinct Entities
• PMIx Standard
 Defined set of APIs, attribute strings
 Nothing about implementation
• PMIx Reference Library
 A full-featured implementation of the Standard
 Intended to ease adoption
• PMIx Reference Server
 Full-featured “shim” to a non-PMIx RM
 Provides development environment
Standards Doc
under
development!
Adoption
• RMs
 SLURM, JSM complete – Fujitsu underway
 Altair ramping up
• Libraries
 OpenMPI, OSHMEM, SOS complete
 GASNet, ORNLshmem – in PR
 MPICH to come (1Q2018?)
• Tools
 Debugger integration under development
Agenda
• State of the Community
 Ralph H. Castain (Intel)
• Scaled Performance
 Aurelien Bouteiller (UTK)
• PMIx Standards Document
 Josh Hursey (IBM)
• Q&A
Agenda
• State of the Community
 Ralph H. Castain (Intel)
• Scaled Performance
 Aurelien Bouteiller (UTK)
• PMIx Standards Document
 Josh Hursey (IBM)
• Q&A
Agenda
• State of the Community
 Ralph H. Castain (Intel)
• Scaled Performance
 Aurelien Bouteiller (UTK)
• PMIx Standards Document
 Josh Hursey (IBM)
• Q&A

More Related Content

What's hot

What's hot (15)

Firewall
FirewallFirewall
Firewall
 
CISSP Prep: Ch 4. Security Engineering (Part 2)
CISSP Prep: Ch 4. Security Engineering (Part 2)CISSP Prep: Ch 4. Security Engineering (Part 2)
CISSP Prep: Ch 4. Security Engineering (Part 2)
 
SC'16 PMIx BoF Presentation
SC'16 PMIx BoF PresentationSC'16 PMIx BoF Presentation
SC'16 PMIx BoF Presentation
 
Firewall
FirewallFirewall
Firewall
 
CNIT 123 12: Cryptography
CNIT 123 12: CryptographyCNIT 123 12: Cryptography
CNIT 123 12: Cryptography
 
Log management &amp; SIEM
Log management &amp; SIEMLog management &amp; SIEM
Log management &amp; SIEM
 
CNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis MethodologyCNIT 121: 11 Analysis Methodology
CNIT 121: 11 Analysis Methodology
 
CNIT 152 11 Analysis Methodology
CNIT 152 11 Analysis MethodologyCNIT 152 11 Analysis Methodology
CNIT 152 11 Analysis Methodology
 
Michael Jones-Resume-OCT2015
Michael Jones-Resume-OCT2015Michael Jones-Resume-OCT2015
Michael Jones-Resume-OCT2015
 
CNIT 121: 16 Report Writing
CNIT 121: 16 Report WritingCNIT 121: 16 Report Writing
CNIT 121: 16 Report Writing
 
CNIT 152: 3 Pre-Incident Preparation
CNIT 152: 3 Pre-Incident PreparationCNIT 152: 3 Pre-Incident Preparation
CNIT 152: 3 Pre-Incident Preparation
 
CNIT 152: 1 Real-World Incidents
CNIT 152: 1 Real-World IncidentsCNIT 152: 1 Real-World Incidents
CNIT 152: 1 Real-World Incidents
 
Core Data in Modern Times
Core Data in Modern TimesCore Data in Modern Times
Core Data in Modern Times
 
CNIT 152 8. Forensic Duplication
CNIT 152 8. Forensic DuplicationCNIT 152 8. Forensic Duplication
CNIT 152 8. Forensic Duplication
 
1. Network Security Monitoring Rationale
1. Network Security Monitoring Rationale1. Network Security Monitoring Rationale
1. Network Security Monitoring Rationale
 

Similar to SC'17 BoF Presentation

PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationrcastain
 
SC'18 BoF Presentation
SC'18 BoF PresentationSC'18 BoF Presentation
SC'18 BoF Presentationrcastain
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Featherrcastain
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 
Slide Deck CISSP Class Session 5
Slide Deck CISSP Class Session 5Slide Deck CISSP Class Session 5
Slide Deck CISSP Class Session 5FRSecure
 
PMIx Updated Overview
PMIx Updated OverviewPMIx Updated Overview
PMIx Updated OverviewRalph Castain
 
Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)ewerkboy
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at ScaleRajeev Bharshetty
 
Using Advanced Threat Analytics to Prevent Privilege Escalation Attacks
Using Advanced Threat Analytics to Prevent Privilege Escalation AttacksUsing Advanced Threat Analytics to Prevent Privilege Escalation Attacks
Using Advanced Threat Analytics to Prevent Privilege Escalation AttacksBeyondTrust
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Codemotion
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 
Enterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsEnterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsPrecisely
 
CNIT 121: 3 Pre-Incident Preparation
CNIT 121: 3 Pre-Incident PreparationCNIT 121: 3 Pre-Incident Preparation
CNIT 121: 3 Pre-Incident PreparationSam Bowne
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
 
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National PoliceiSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 
2018 FRecure CISSP Mentor Program- Session 4
2018 FRecure CISSP Mentor Program- Session 42018 FRecure CISSP Mentor Program- Session 4
2018 FRecure CISSP Mentor Program- Session 4FRSecure
 
Devoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceDevoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 

Similar to SC'17 BoF Presentation (20)

PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentation
 
Microservices
MicroservicesMicroservices
Microservices
 
SC'18 BoF Presentation
SC'18 BoF PresentationSC'18 BoF Presentation
SC'18 BoF Presentation
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Feather
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 
Slide Deck CISSP Class Session 5
Slide Deck CISSP Class Session 5Slide Deck CISSP Class Session 5
Slide Deck CISSP Class Session 5
 
PMIx Updated Overview
PMIx Updated OverviewPMIx Updated Overview
PMIx Updated Overview
 
Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)Pm ix tutorial-june2019-pub (1)
Pm ix tutorial-june2019-pub (1)
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
 
Using Advanced Threat Analytics to Prevent Privilege Escalation Attacks
Using Advanced Threat Analytics to Prevent Privilege Escalation AttacksUsing Advanced Threat Analytics to Prevent Privilege Escalation Attacks
Using Advanced Threat Analytics to Prevent Privilege Escalation Attacks
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
 
Enterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsEnterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected Environments
 
CNIT 121: 3 Pre-Incident Preparation
CNIT 121: 3 Pre-Incident PreparationCNIT 121: 3 Pre-Incident Preparation
CNIT 121: 3 Pre-Incident Preparation
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National PoliceiSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
 
2018 FRecure CISSP Mentor Program- Session 4
2018 FRecure CISSP Mentor Program- Session 42018 FRecure CISSP Mentor Program- Session 4
2018 FRecure CISSP Mentor Program- Session 4
 
Devoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceDevoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National Police
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

SC'17 BoF Presentation

  • 1. PMIx: Process Management for Exascale Environments
  • 2. Agenda • State of the Community  Ralph H. Castain (Intel) • Scaled Performance  Aurelien Bouteiller (UTK) • PMIx Standards Document  Josh Hursey (IBM) • Q&A
  • 3. Where Are We? • Launch scaling  Wireup enhancements complete  Fabric “instant on” enablement underway • Results  Tracks spawn propagation time  Exascale in < 5 seconds  3rd party confirmation
  • 4. PMIx Launch Sequence *RM daemon, mpirun-daemon, etc.
  • 5. PMIx Launch Sequence *RM daemon, mpirun-daemon, etc.
  • 6. Current Support (I) • Typical startup operations  Put, get, commit, barrier, spawn, [dis]connect, publish/lookup • Tool connections  Debugger, job submission, query • Generalized query support  Job status, layout, system data, resource availability • Event notification  App, system generated  Subscribe, chained  Preemption, failures, timeout warning, … • Logging  Status reports, error output • Flexible allocations  Release resources, request resources
  • 7. Current Support (II) • Network support  Security keys, pre- spawn local driver setup • Obsolescence protection  Automatic cross- version compatibility  Container support • Job control  Pause, kill, signal, heartbeat, resilience support (C/R coordination) • Async definition of process groups  Rolling startup/teardown
  • 8. In Pipeline • Network support  Fabric topology and status, traffic reports, fabric manager interaction • MPI Sessions support  Rolling startup/teardown • Generalized data store  Distributed key-value storage • Security  Obtain and validate credentials for application/SMS • File system support  Dependency detection  Tiered storage caching strategies • Debugger/tool support++  Automatic rendezvous  Single interface to all launchers  Co-launch daemons  Access fabric info, etc. • Cross-library interoperation  OpenMP/MPI coordination Nearing completion Ramping up Idling
  • 9. Three Distinct Entities • PMIx Standard  Defined set of APIs, attribute strings  Nothing about implementation • PMIx Reference Library  A full-featured implementation of the Standard  Intended to ease adoption • PMIx Reference Server  Full-featured “shim” to a non-PMIx RM  Provides development environment Standards Doc under development!
  • 10. Adoption • RMs  SLURM, JSM complete – Fujitsu underway  Altair ramping up • Libraries  OpenMPI, OSHMEM, SOS complete  GASNet, ORNLshmem – in PR  MPICH to come (1Q2018?) • Tools  Debugger integration under development
  • 11. Agenda • State of the Community  Ralph H. Castain (Intel) • Scaled Performance  Aurelien Bouteiller (UTK) • PMIx Standards Document  Josh Hursey (IBM) • Q&A
  • 12. Agenda • State of the Community  Ralph H. Castain (Intel) • Scaled Performance  Aurelien Bouteiller (UTK) • PMIx Standards Document  Josh Hursey (IBM) • Q&A
  • 13. Agenda • State of the Community  Ralph H. Castain (Intel) • Scaled Performance  Aurelien Bouteiller (UTK) • PMIx Standards Document  Josh Hursey (IBM) • Q&A