SlideShare a Scribd company logo
Singularity
On-node Resource Manager for Containerized HPC
Workloads
Geoffroy Vallee, Carlos Eduardo Arango Gutierrez, Cedric Clerget
Canopie HPC - Super Computing 2019
Introduction
● HPC workloads are becoming more complex, e.g., MPI+X
○ Multiple runtimes are now running side by side
○ Most runtimes assume by default that all available resources can be used
○ Containers are becoming a standard way to run scientific workloads
● HPC systems are becoming more complex
○ Current trend is for bigger, more powerful compute nodes
○ Multiple accelerators per node
Complex resource partitioning and sharing problems
The community started to discuss about internal resource
manager for applications
Challenges
The venue of new systems, containers and workloads requiring multiple
runtimes create new challenges
➔ It creates a deep hierarchy of execution contexts
◆ processes & threads running on the host
◆ containers and everything running inside the containers
➔ How can we precisely partition resources between?
➔ How can we share hardware resources?
➔ How can we assign hardware resources to process/threads that are
potentially running deep in the hierarchy?
Goals
➔ Design a solution to be deployed on current and future systems
◆ Integration with global scheduler
◆ Support legacy scientific workloads, i.e., MPI
◆ Support upcoming scientific workloads, e.g., MPI+X and beyond
➔ Scalable resource management that covers containers
➔ Hierarchical architecture
◆ Scalable
◆ Compatible with existing infrastructure
◆ Compatible with existing programming languages (e.g., MPI) and runtimes
➔ Minimize modifications to existing implementations, specifications
and standards
PMIx Overview
● Process Management Interface for eXascale
● PMIx is both a specification/standard and an implementation
(OpenPMIx)
● Provides all the building blocks for the implementation of distributed
asynchronous runtimes
○ Client/server model for scalability
○ Interface to resource managers
○ Event based architecture
● Now available for most of major MPI implementations
● Use by industry, government, academia
Singularity Overview
● Specially designed for HPC
● Container solution that runs strictly under the user’s ID (no privileged
access required)
● No heavy daemon required to run on the node (no system noise)
● Works well with parallel file system
● Let users
○ Package their application and data; bring and execute the package to virtually any
system
○ Low to no-overhead while executing applications
○ Encrypt their containers
○ Sign and verify their containers
Architecture Overview
➔ Add a new PMIx thread to the Singularity runtime
➔ Interface Singularity with the global resource manager
➔ Ease the mapping of resources
Extensions to PMIx
➔ From a standard point-of-view
◆ PMIx already provides all the required interfaces and semantics
◆ Points of interest
● Semantic mapping between PMIx and Singularity namespaces
● Tool interface
● Allocation API: PMIX_ALLOC_NEW and PMIX_ALLOC_EXTEND can be
leveraged to have a flexible way to add more resources to containers (when
required resources are not known upfront)
➔ From an implementation point-of-view
◆ Need to extend an implementation to support Singularity containers
● It would be useful to have the PMIx runtime being capable of managing
containers
● OpenPMIx is a good candidate
Extensions to Singularity
Add a PMIx thread to the
Singularity runtime
➔ Act as a PMIx client
◆ What resources are being
assigned to the container?
➔ Act as a PMIx server
◆ How resources are assigned
to what is running in the
container
Extensions to Singularity (2)
➔ Ultimately creates a hierarchy of PMIx clients/servers
◆ Easier to map, allocate and manage resources assigned and used by containers
◆ Gives a better control over assigning processes and threads running inside the
container onto resources (e.g., MPI binding)
➔ Leverage the PMIx tool interface for debugging and profiling
➔ Enable runtime coordination
◆ Leverage PMIx event, communication and synchronization mechanisms
Conclusion
We propose an extension of the Singularity containers runtime and PMIx
implementation to provide:
➔ A scalable resource manager for complex workloads that include
containers
➔ A solution compatible with existing solutions and infrastructures
➔ New capabilities to assign, share and/or partition resources between
different runtimes (e.g., MPI, OpenMP, Singularity)

More Related Content

What's hot

Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro services
Chiradeep Vittal
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
Gianluca Arbezzano
 
Into the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHenginesInto the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHengines
Simon Leinen
 
What next after microservices
What next after microservicesWhat next after microservices
What next after microservices
Bilgin Ibryam
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebula Project
 
GitOps (& Flux) for Helm Users with Scott Rigby
GitOps (& Flux) for Helm Users with Scott RigbyGitOps (& Flux) for Helm Users with Scott Rigby
GitOps (& Flux) for Helm Users with Scott Rigby
Weaveworks
 
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack
Josef Karásek
 
ppt presentation
ppt presentationppt presentation
ppt presentation
webhostingguy
 
Overview and Opentracing in theory by Gianluca Arbezzano
Overview and Opentracing in theory by Gianluca ArbezzanoOverview and Opentracing in theory by Gianluca Arbezzano
Overview and Opentracing in theory by Gianluca Arbezzano
Gianluca Arbezzano
 
Netflix Titus WASP October 2017
Netflix Titus WASP October 2017Netflix Titus WASP October 2017
Netflix Titus WASP October 2017
Andrew Leung
 
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
OpenNebula Project
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
bipin kunal
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Kubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz RiceKubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz Rice
CloudOps2005
 
Network Support for Cloud Applications - Felix Cuadrado
Network Support for Cloud Applications - Felix CuadradoNetwork Support for Cloud Applications - Felix Cuadrado
Network Support for Cloud Applications - Felix Cuadrado
CPqD
 
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebula Project
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
Hemant Kumar
 
The Evolution of Distributed Systems on Kubernetes
The Evolution of Distributed Systems on KubernetesThe Evolution of Distributed Systems on Kubernetes
The Evolution of Distributed Systems on Kubernetes
Bilgin Ibryam
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
Rita Zhang
 

What's hot (20)

Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro services
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017Open Tracing, to order and understand your mess. - ApiConf 2017
Open Tracing, to order and understand your mess. - ApiConf 2017
 
Into the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHenginesInto the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHengines
 
What next after microservices
What next after microservicesWhat next after microservices
What next after microservices
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
 
GitOps (& Flux) for Helm Users with Scott Rigby
GitOps (& Flux) for Helm Users with Scott RigbyGitOps (& Flux) for Helm Users with Scott Rigby
GitOps (& Flux) for Helm Users with Scott Rigby
 
Logs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK StackLogs/Metrics Gathering With OpenShift EFK Stack
Logs/Metrics Gathering With OpenShift EFK Stack
 
ppt presentation
ppt presentationppt presentation
ppt presentation
 
Overview and Opentracing in theory by Gianluca Arbezzano
Overview and Opentracing in theory by Gianluca ArbezzanoOverview and Opentracing in theory by Gianluca Arbezzano
Overview and Opentracing in theory by Gianluca Arbezzano
 
Netflix Titus WASP October 2017
Netflix Titus WASP October 2017Netflix Titus WASP October 2017
Netflix Titus WASP October 2017
 
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Kubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz RiceKubernetes 1.12 Update and Container Security with Liz Rice
Kubernetes 1.12 Update and Container Security with Liz Rice
 
Network Support for Cloud Applications - Felix Cuadrado
Network Support for Cloud Applications - Felix CuadradoNetwork Support for Cloud Applications - Felix Cuadrado
Network Support for Cloud Applications - Felix Cuadrado
 
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
The Evolution of Distributed Systems on Kubernetes
The Evolution of Distributed Systems on KubernetesThe Evolution of Distributed Systems on Kubernetes
The Evolution of Distributed Systems on Kubernetes
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
 

Similar to On-node resource manager for containerized HPC workloads

Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Next gen software operations models in the cloud
Next gen software operations models in the cloudNext gen software operations models in the cloud
Next gen software operations models in the cloud
Aarno Aukia
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
WSO2
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
Srijan Technologies
 
Red Hat Container Strategy
Red Hat Container StrategyRed Hat Container Strategy
Red Hat Container Strategy
Red Hat Events
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Container Landscape in 2019
Container Landscape in 2019Container Landscape in 2019
Container Landscape in 2019
Anusha Ragunathan
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deployment
Dr Ganesh Iyer
 
Mcroservices with docker kubernetes, goang and grpc, overview
Mcroservices with docker kubernetes, goang and grpc, overviewMcroservices with docker kubernetes, goang and grpc, overview
Mcroservices with docker kubernetes, goang and grpc, overview
Faculty of Technical Sciences, University of Novi Sad
 
{code} and containers
{code} and containers{code} and containers
{code} and containers
{code} by Dell EMC
 
Kubernetes the deltatre way the basics - introduction to containers and orc...
Kubernetes the deltatre way   the basics - introduction to containers and orc...Kubernetes the deltatre way   the basics - introduction to containers and orc...
Kubernetes the deltatre way the basics - introduction to containers and orc...
Rauno De Pasquale
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
Access Innovations, Inc.
 
Mirantis OpenStack 5.0 Overview
Mirantis OpenStack 5.0 OverviewMirantis OpenStack 5.0 Overview
Mirantis OpenStack 5.0 Overview
Mirantis
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
Puppet
 
Maintaining an up to date application stack (in a containerized world)
Maintaining an up to date application stack (in a containerized world)Maintaining an up to date application stack (in a containerized world)
Maintaining an up to date application stack (in a containerized world)
Christoph Görn
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
All Things Open
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
GDSC
 

Similar to On-node resource manager for containerized HPC workloads (20)

Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Next gen software operations models in the cloud
Next gen software operations models in the cloudNext gen software operations models in the cloud
Next gen software operations models in the cloud
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
Red Hat Container Strategy
Red Hat Container StrategyRed Hat Container Strategy
Red Hat Container Strategy
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Container Landscape in 2019
Container Landscape in 2019Container Landscape in 2019
Container Landscape in 2019
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deployment
 
Mcroservices with docker kubernetes, goang and grpc, overview
Mcroservices with docker kubernetes, goang and grpc, overviewMcroservices with docker kubernetes, goang and grpc, overview
Mcroservices with docker kubernetes, goang and grpc, overview
 
{code} and containers
{code} and containers{code} and containers
{code} and containers
 
Kubernetes the deltatre way the basics - introduction to containers and orc...
Kubernetes the deltatre way   the basics - introduction to containers and orc...Kubernetes the deltatre way   the basics - introduction to containers and orc...
Kubernetes the deltatre way the basics - introduction to containers and orc...
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Mirantis OpenStack 5.0 Overview
Mirantis OpenStack 5.0 OverviewMirantis OpenStack 5.0 Overview
Mirantis OpenStack 5.0 Overview
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
PuppetConf 2016: Using Puppet with Kubernetes and OpenShift – Diane Mueller, ...
 
Maintaining an up to date application stack (in a containerized world)
Maintaining an up to date application stack (in a containerized world)Maintaining an up to date application stack (in a containerized world)
Maintaining an up to date application stack (in a containerized world)
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
 

Recently uploaded

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
Massimo Artizzu
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 

Recently uploaded (20)

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 

On-node resource manager for containerized HPC workloads

  • 1. Singularity On-node Resource Manager for Containerized HPC Workloads Geoffroy Vallee, Carlos Eduardo Arango Gutierrez, Cedric Clerget Canopie HPC - Super Computing 2019
  • 2. Introduction ● HPC workloads are becoming more complex, e.g., MPI+X ○ Multiple runtimes are now running side by side ○ Most runtimes assume by default that all available resources can be used ○ Containers are becoming a standard way to run scientific workloads ● HPC systems are becoming more complex ○ Current trend is for bigger, more powerful compute nodes ○ Multiple accelerators per node Complex resource partitioning and sharing problems The community started to discuss about internal resource manager for applications
  • 3. Challenges The venue of new systems, containers and workloads requiring multiple runtimes create new challenges ➔ It creates a deep hierarchy of execution contexts ◆ processes & threads running on the host ◆ containers and everything running inside the containers ➔ How can we precisely partition resources between? ➔ How can we share hardware resources? ➔ How can we assign hardware resources to process/threads that are potentially running deep in the hierarchy?
  • 4. Goals ➔ Design a solution to be deployed on current and future systems ◆ Integration with global scheduler ◆ Support legacy scientific workloads, i.e., MPI ◆ Support upcoming scientific workloads, e.g., MPI+X and beyond ➔ Scalable resource management that covers containers ➔ Hierarchical architecture ◆ Scalable ◆ Compatible with existing infrastructure ◆ Compatible with existing programming languages (e.g., MPI) and runtimes ➔ Minimize modifications to existing implementations, specifications and standards
  • 5. PMIx Overview ● Process Management Interface for eXascale ● PMIx is both a specification/standard and an implementation (OpenPMIx) ● Provides all the building blocks for the implementation of distributed asynchronous runtimes ○ Client/server model for scalability ○ Interface to resource managers ○ Event based architecture ● Now available for most of major MPI implementations ● Use by industry, government, academia
  • 6. Singularity Overview ● Specially designed for HPC ● Container solution that runs strictly under the user’s ID (no privileged access required) ● No heavy daemon required to run on the node (no system noise) ● Works well with parallel file system ● Let users ○ Package their application and data; bring and execute the package to virtually any system ○ Low to no-overhead while executing applications ○ Encrypt their containers ○ Sign and verify their containers
  • 7. Architecture Overview ➔ Add a new PMIx thread to the Singularity runtime ➔ Interface Singularity with the global resource manager ➔ Ease the mapping of resources
  • 8. Extensions to PMIx ➔ From a standard point-of-view ◆ PMIx already provides all the required interfaces and semantics ◆ Points of interest ● Semantic mapping between PMIx and Singularity namespaces ● Tool interface ● Allocation API: PMIX_ALLOC_NEW and PMIX_ALLOC_EXTEND can be leveraged to have a flexible way to add more resources to containers (when required resources are not known upfront) ➔ From an implementation point-of-view ◆ Need to extend an implementation to support Singularity containers ● It would be useful to have the PMIx runtime being capable of managing containers ● OpenPMIx is a good candidate
  • 9. Extensions to Singularity Add a PMIx thread to the Singularity runtime ➔ Act as a PMIx client ◆ What resources are being assigned to the container? ➔ Act as a PMIx server ◆ How resources are assigned to what is running in the container
  • 10. Extensions to Singularity (2) ➔ Ultimately creates a hierarchy of PMIx clients/servers ◆ Easier to map, allocate and manage resources assigned and used by containers ◆ Gives a better control over assigning processes and threads running inside the container onto resources (e.g., MPI binding) ➔ Leverage the PMIx tool interface for debugging and profiling ➔ Enable runtime coordination ◆ Leverage PMIx event, communication and synchronization mechanisms
  • 11. Conclusion We propose an extension of the Singularity containers runtime and PMIx implementation to provide: ➔ A scalable resource manager for complex workloads that include containers ➔ A solution compatible with existing solutions and infrastructures ➔ New capabilities to assign, share and/or partition resources between different runtimes (e.g., MPI, OpenMP, Singularity)