SlideShare a Scribd company logo
RM SCON
Annapurna Dasari
Intel Inc.
RM Overlay Networks
• An overlay network of the resident RM daemons (ie
slurmd/orcmd overlay network). Spans over the
daemons in the allocation, lifetime – single/mulitple
job executions session start - session end.
• An overlay network of the job step daemons
responsible for launch and management of the MPI
processes.
– One participant per node.
– Spans across all step daemons where the job is executing.
– Lifetime: job launch – job termination.
• Embedded overlay networks that use the RMs native
communication.
Additional RM Overlay Networks
• The RM may create an additional special purpose
overlay network outside of its own integrated
communication system for:
– reliable broadcast for event notification, job launch
– scalable collectives - for initial wire-up, oob request
for job information by debuggers or other tools
– High speed fabrics – RM daemons can speed up
communications by using an overlay network that can
send messages on a high speed fabric and it also has
the required hooks for taking advantages of the high
speed fabrics capabilities (can provide required QoS).
Additional RM Overlay Networks
• RMs can create an adhoc overlay network on
demand for a specific purpose or they can
offload all further communications to the
newly created overlay network.
• The SCON library provides interface to create,
send, recv point to point messages and
broadcast, all gather and barrier capabilities.
PMIx Event Notification SCON
• PMIx provides a capability that allows MPI applications
to request the RM to notify error events of relevance
to them. They can also request the RM to propagate
events detected by them to their peers.
• The RM local PMIx server is required to provide reliable
notification of the error events to all interested parties
(process could belong to other jobs).
• The RM local PMIx server could create a SCON among
the requested parties and send the event notification
broadcast on the SCON.
PMIx Event Notification SCON
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
local RM
PMIx Server
Event Notification SCON Creation
• The SCON participants consists of
– All local RM (PMIx server) daemons for the specific job
plus
– All local RM daemons of the job to which process of the
specified job is connected to. The parent process spawning
the job and any explicit connections requested via PMIx
connect.
• The SCON can be created during PMIx server
initialization or on demand.
• RM PMIx server daemons may join and leave the SCON
according to the PMIx connect/disconnect requests?
Creating a RM SCON
• The participant daemons create SCON by
calling the create API.
• Participant list is provided by the daemon.
• Specify info keys for fabric selection, topology
• Request reliable broadcast capability during
create by specifying the info key.
• Topology is a tree topology with rank 0
daemon at root.
Creating a RM SCON
• SCON library will in response wire up the
SCON according to the topology and complete
the create operation when all participants
join.
• SCON is operational. Typical operations
include
– broadcast event notification
– Allgather/barrier for fence operations.
– Pt to pt send/recv?
Join a RM SCON
• Additional RM daemons may join an existing RM SCON
in response to connect requests.
• The joining daemon must specify the existing SCON
information to the library where it wants to join.
• The library would join the new member using internal
xcast and allgather operations among all members.
• Should we provide new scon_join API or enable join by
specifying the join directive in create info keys?
• Do we need to support a join?
• Do existing participants know outside of the SCON
about the new member? How do RMs handle connect
requests today?
Reliable xcast on RM SCON
• The notifying daemon broadcasts the event
notification message to all members by calling
scon_xcast API
• Is it required/nice to have/not required to know
when
– The xcast has completed
– The xcast completion status with detailed error
information.
• What would the RM daemon do if the xcast
cannot be delivered to some of the participant
daemons?
Allgather/Barrier on RM SCON
• Perform barrier operation by calling the
scon_barrier API.
• Allgather API to be added.
• Brainstorm:
– Reliable allgather/barrier? – being able to
synchronize among the live participants.
– allgather/barrier timeout – each process releases
itself from the barrier upon timeout?

More Related Content

What's hot

Iuwne10 S02 L07
Iuwne10 S02 L07Iuwne10 S02 L07
Iuwne10 S02 L07
Ravi Ranjan
 
Introduction to Consul
Introduction to ConsulIntroduction to Consul
Introduction to Consul
Viswanath J
 
4270 vlan-tutorial
4270 vlan-tutorial4270 vlan-tutorial
4270 vlan-tutorial
jagacisco
 
VLAN
VLANVLAN
vlan
vlanvlan
Vlan
Vlan Vlan
Presentation
PresentationPresentation
Presentation
Ismail Miah
 
Benefits of vlan
Benefits of vlanBenefits of vlan
Benefits of vlan
Logitrain
 
Virtual LAN
Virtual LANVirtual LAN
Virtual LAN
Lilesh Pathe
 
Dean – first draft
Dean – first draftDean – first draft
Dean – first draft
Swarvanu Sengupta
 
Vlan
VlanVlan
Inter-Vlan Routeando
Inter-Vlan RouteandoInter-Vlan Routeando
Inter-Vlan Routeando
Alberto Jimenez
 
Virtual LAN
Virtual LANVirtual LAN
Virtual LAN
Darshan Dalwadi
 
Vlan final
Vlan finalVlan final
Vlan final
Veena Rao
 
VLAN Membership Types
VLAN Membership TypesVLAN Membership Types
VLAN Membership Types
NetProtocol Xpert
 
Open Innovation via Java-enabled Network Devices
Open Innovation via Java-enabled Network DevicesOpen Innovation via Java-enabled Network Devices
Open Innovation via Java-enabled Network Devices
Tal Lavian Ph.D.
 
Inter VLAN Routing
Inter VLAN RoutingInter VLAN Routing
Inter VLAN Routing
Netwax Lab
 
Day 14.2 inter vlan
Day 14.2 inter vlanDay 14.2 inter vlan
Day 14.2 inter vlan
CYBERINTELLIGENTS
 
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with MerlinOSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
NETWAYS
 

What's hot (19)

Iuwne10 S02 L07
Iuwne10 S02 L07Iuwne10 S02 L07
Iuwne10 S02 L07
 
Introduction to Consul
Introduction to ConsulIntroduction to Consul
Introduction to Consul
 
4270 vlan-tutorial
4270 vlan-tutorial4270 vlan-tutorial
4270 vlan-tutorial
 
VLAN
VLANVLAN
VLAN
 
vlan
vlanvlan
vlan
 
Vlan
Vlan Vlan
Vlan
 
Presentation
PresentationPresentation
Presentation
 
Benefits of vlan
Benefits of vlanBenefits of vlan
Benefits of vlan
 
Virtual LAN
Virtual LANVirtual LAN
Virtual LAN
 
Dean – first draft
Dean – first draftDean – first draft
Dean – first draft
 
Vlan
VlanVlan
Vlan
 
Inter-Vlan Routeando
Inter-Vlan RouteandoInter-Vlan Routeando
Inter-Vlan Routeando
 
Virtual LAN
Virtual LANVirtual LAN
Virtual LAN
 
Vlan final
Vlan finalVlan final
Vlan final
 
VLAN Membership Types
VLAN Membership TypesVLAN Membership Types
VLAN Membership Types
 
Open Innovation via Java-enabled Network Devices
Open Innovation via Java-enabled Network DevicesOpen Innovation via Java-enabled Network Devices
Open Innovation via Java-enabled Network Devices
 
Inter VLAN Routing
Inter VLAN RoutingInter VLAN Routing
Inter VLAN Routing
 
Day 14.2 inter vlan
Day 14.2 inter vlanDay 14.2 inter vlan
Day 14.2 inter vlan
 
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with MerlinOSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
 

Viewers also liked

master certificate
master certificatemaster certificate
master certificate
Abdullah Alzaben
 
ReachDisplay InApp - Tom Ritter - Brief
ReachDisplay InApp - Tom Ritter - BriefReachDisplay InApp - Tom Ritter - Brief
ReachDisplay InApp - Tom Ritter - Brief
Tom Ritter
 
Alexander and Baldwin - East Maui Irrigation - Hawaii Legislature -- Land, W...
Alexander and Baldwin  - East Maui Irrigation - Hawaii Legislature -- Land, W...Alexander and Baldwin  - East Maui Irrigation - Hawaii Legislature -- Land, W...
Alexander and Baldwin - East Maui Irrigation - Hawaii Legislature -- Land, W...
Clifton M. Hasegawa & Associates, LLC
 
PC102: Metals
PC102: MetalsPC102: Metals
PC102: Metals
Wren Bailey
 
Pai1
Pai1Pai1
Plantilla de ciencias
Plantilla de cienciasPlantilla de ciencias
Plantilla de ciencias
martincampozapata
 
RR Resume
RR ResumeRR Resume
RR Resume
Rodney Rice
 
Presentation-About Progressive Accounting and Tax Solutions
Presentation-About Progressive Accounting and Tax SolutionsPresentation-About Progressive Accounting and Tax Solutions
Presentation-About Progressive Accounting and Tax Solutions
Matt Zahler
 
BT Home Hub 5 User Guide
BT Home Hub 5 User GuideBT Home Hub 5 User Guide
BT Home Hub 5 User Guide
Telephones Online
 
1 ing jose_campos
1 ing jose_campos1 ing jose_campos
1 ing jose_campos
ALAN QUISPE CORONEL
 

Viewers also liked (10)

master certificate
master certificatemaster certificate
master certificate
 
ReachDisplay InApp - Tom Ritter - Brief
ReachDisplay InApp - Tom Ritter - BriefReachDisplay InApp - Tom Ritter - Brief
ReachDisplay InApp - Tom Ritter - Brief
 
Alexander and Baldwin - East Maui Irrigation - Hawaii Legislature -- Land, W...
Alexander and Baldwin  - East Maui Irrigation - Hawaii Legislature -- Land, W...Alexander and Baldwin  - East Maui Irrigation - Hawaii Legislature -- Land, W...
Alexander and Baldwin - East Maui Irrigation - Hawaii Legislature -- Land, W...
 
PC102: Metals
PC102: MetalsPC102: Metals
PC102: Metals
 
Pai1
Pai1Pai1
Pai1
 
Plantilla de ciencias
Plantilla de cienciasPlantilla de ciencias
Plantilla de ciencias
 
RR Resume
RR ResumeRR Resume
RR Resume
 
Presentation-About Progressive Accounting and Tax Solutions
Presentation-About Progressive Accounting and Tax SolutionsPresentation-About Progressive Accounting and Tax Solutions
Presentation-About Progressive Accounting and Tax Solutions
 
BT Home Hub 5 User Guide
BT Home Hub 5 User GuideBT Home Hub 5 User Guide
BT Home Hub 5 User Guide
 
1 ing jose_campos
1 ing jose_campos1 ing jose_campos
1 ing jose_campos
 

Similar to Rm scon

Demystifying puppet
Demystifying puppetDemystifying puppet
Demystifying puppet
Ajeet Singh Raina
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
Sunita Sahu
 
Remoting and serialization
Remoting and serializationRemoting and serialization
Remoting and serialization
Chathurangi Shyalika
 
24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS
koolkampus
 
CHP-4.pptx
CHP-4.pptxCHP-4.pptx
CHP-4.pptx
FamiDan
 
Remote procedure calls
Remote procedure callsRemote procedure calls
Remote procedure calls
imnomus
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure calls
Ashish Kumar
 
Middleware in Distributed System-RPC,RMI
Middleware in Distributed System-RPC,RMIMiddleware in Distributed System-RPC,RMI
Middleware in Distributed System-RPC,RMI
Prajakta Rane
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
Sistek Yazılım
 
Remote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemRemote Procedure Call in Distributed System
Remote Procedure Call in Distributed System
PoojaBele1
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computing
Satya P. Joshi
 
20240
2024020240
CisCon 2018 - Overlay Management Protocol e IPsec
CisCon 2018 - Overlay Management Protocol e IPsecCisCon 2018 - Overlay Management Protocol e IPsec
CisCon 2018 - Overlay Management Protocol e IPsec
AreaNetworking.it
 
Aspect UIP Logical Architecture
Aspect UIP Logical ArchitectureAspect UIP Logical Architecture
Aspect UIP Logical Architecture
Vishad Garg
 
Remote invocation
Remote invocationRemote invocation
Remote invocation
ishapadhy
 
AINTEC 2023: Networking in the Penumbra!
AINTEC 2023: Networking in the Penumbra!AINTEC 2023: Networking in the Penumbra!
AINTEC 2023: Networking in the Penumbra!
APNIC
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
PLUMgrid
 
Vmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologiesVmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologies
CloudSyntrix
 
Vmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologiesVmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologies
Udressme1
 
Network Penetration Testing
Network Penetration TestingNetwork Penetration Testing
Network Penetration Testing
Mohammed Adam
 

Similar to Rm scon (20)

Demystifying puppet
Demystifying puppetDemystifying puppet
Demystifying puppet
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Remoting and serialization
Remoting and serializationRemoting and serialization
Remoting and serialization
 
24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS24. Advanced Transaction Processing in DBMS
24. Advanced Transaction Processing in DBMS
 
CHP-4.pptx
CHP-4.pptxCHP-4.pptx
CHP-4.pptx
 
Remote procedure calls
Remote procedure callsRemote procedure calls
Remote procedure calls
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure calls
 
Middleware in Distributed System-RPC,RMI
Middleware in Distributed System-RPC,RMIMiddleware in Distributed System-RPC,RMI
Middleware in Distributed System-RPC,RMI
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
 
Remote Procedure Call in Distributed System
Remote Procedure Call in Distributed SystemRemote Procedure Call in Distributed System
Remote Procedure Call in Distributed System
 
Remote procedure call on client server computing
Remote procedure call on client server computingRemote procedure call on client server computing
Remote procedure call on client server computing
 
20240
2024020240
20240
 
CisCon 2018 - Overlay Management Protocol e IPsec
CisCon 2018 - Overlay Management Protocol e IPsecCisCon 2018 - Overlay Management Protocol e IPsec
CisCon 2018 - Overlay Management Protocol e IPsec
 
Aspect UIP Logical Architecture
Aspect UIP Logical ArchitectureAspect UIP Logical Architecture
Aspect UIP Logical Architecture
 
Remote invocation
Remote invocationRemote invocation
Remote invocation
 
AINTEC 2023: Networking in the Penumbra!
AINTEC 2023: Networking in the Penumbra!AINTEC 2023: Networking in the Penumbra!
AINTEC 2023: Networking in the Penumbra!
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
 
Vmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologiesVmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologies
 
Vmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologiesVmware vsan-layer2-and-layer3-network-topologies
Vmware vsan-layer2-and-layer3-network-topologies
 
Network Penetration Testing
Network Penetration TestingNetwork Penetration Testing
Network Penetration Testing
 

Recently uploaded

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 

Recently uploaded (20)

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 

Rm scon

  • 2. RM Overlay Networks • An overlay network of the resident RM daemons (ie slurmd/orcmd overlay network). Spans over the daemons in the allocation, lifetime – single/mulitple job executions session start - session end. • An overlay network of the job step daemons responsible for launch and management of the MPI processes. – One participant per node. – Spans across all step daemons where the job is executing. – Lifetime: job launch – job termination. • Embedded overlay networks that use the RMs native communication.
  • 3. Additional RM Overlay Networks • The RM may create an additional special purpose overlay network outside of its own integrated communication system for: – reliable broadcast for event notification, job launch – scalable collectives - for initial wire-up, oob request for job information by debuggers or other tools – High speed fabrics – RM daemons can speed up communications by using an overlay network that can send messages on a high speed fabric and it also has the required hooks for taking advantages of the high speed fabrics capabilities (can provide required QoS).
  • 4. Additional RM Overlay Networks • RMs can create an adhoc overlay network on demand for a specific purpose or they can offload all further communications to the newly created overlay network. • The SCON library provides interface to create, send, recv point to point messages and broadcast, all gather and barrier capabilities.
  • 5. PMIx Event Notification SCON • PMIx provides a capability that allows MPI applications to request the RM to notify error events of relevance to them. They can also request the RM to propagate events detected by them to their peers. • The RM local PMIx server is required to provide reliable notification of the error events to all interested parties (process could belong to other jobs). • The RM local PMIx server could create a SCON among the requested parties and send the event notification broadcast on the SCON.
  • 6. PMIx Event Notification SCON local RM PMIx Server local RM PMIx Server local RM PMIx Server local RM PMIx Server local RM PMIx Server local RM PMIx Server local RM PMIx Server local RM PMIx Server
  • 7. Event Notification SCON Creation • The SCON participants consists of – All local RM (PMIx server) daemons for the specific job plus – All local RM daemons of the job to which process of the specified job is connected to. The parent process spawning the job and any explicit connections requested via PMIx connect. • The SCON can be created during PMIx server initialization or on demand. • RM PMIx server daemons may join and leave the SCON according to the PMIx connect/disconnect requests?
  • 8. Creating a RM SCON • The participant daemons create SCON by calling the create API. • Participant list is provided by the daemon. • Specify info keys for fabric selection, topology • Request reliable broadcast capability during create by specifying the info key. • Topology is a tree topology with rank 0 daemon at root.
  • 9. Creating a RM SCON • SCON library will in response wire up the SCON according to the topology and complete the create operation when all participants join. • SCON is operational. Typical operations include – broadcast event notification – Allgather/barrier for fence operations. – Pt to pt send/recv?
  • 10. Join a RM SCON • Additional RM daemons may join an existing RM SCON in response to connect requests. • The joining daemon must specify the existing SCON information to the library where it wants to join. • The library would join the new member using internal xcast and allgather operations among all members. • Should we provide new scon_join API or enable join by specifying the join directive in create info keys? • Do we need to support a join? • Do existing participants know outside of the SCON about the new member? How do RMs handle connect requests today?
  • 11. Reliable xcast on RM SCON • The notifying daemon broadcasts the event notification message to all members by calling scon_xcast API • Is it required/nice to have/not required to know when – The xcast has completed – The xcast completion status with detailed error information. • What would the RM daemon do if the xcast cannot be delivered to some of the participant daemons?
  • 12. Allgather/Barrier on RM SCON • Perform barrier operation by calling the scon_barrier API. • Allgather API to be added. • Brainstorm: – Reliable allgather/barrier? – being able to synchronize among the live participants. – allgather/barrier timeout – each process releases itself from the barrier upon timeout?