SlideShare a Scribd company logo
1 of 8
www.univa.com
www.univa.com
Ian Lumb
Solutions Architect
SUSE, Booth #1681
SC17, Denver, CO
Managing Containerized
HPC and AI Workloads on
TSUBAME3.0
www.univa.com
2
www.univa.com
3
www.univa.com
Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 4
TSUBAME 3.0 - Compute Node Overview
A compute-node:
■ 256 GB DDR4 RAM
■ 2 TB SSDs
■ 2x 14 cores
■ 4x GPUs
■ 4x HFI (1000 Gbps)
⇒ This is what they call
a “fat compute node”
www.univa.com
Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 5
TSUBAME 3.0 - The Challenges
12.2 PetaFLOPS within only 20 racks or 540 compute
nodes
➢ It is the smallest >10 PFLOPS machine in the world
➢ Wasted/unreachable resources (parts of a node) have a
much bigger impact on such a “small” cluster
➢ Performance is also highly dependent on the job-
placement due to additional resources, such as GPUs
and HFI-devices (the closer, the better)
➢ It needs smart and flexible partitioning to ensure a
high utilization
www.univa.com
6
TSUBAME 3.0 - UGE Enhancements
▪ Core Bindings
▪ Enhanced PE support and strategies
▪ RSMAPS
▪ Enhanced PE support and chaining
▪ Docker
▪ Define unique but known container hostnames
▪ Configure Infiniband device in the container
▪ Map all job users into the container
▪ Provide execution host and Docker container hostnames to the job
www.univa.com
Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 7
Putting it all together …
qsub -l docker,docker_images="*ubuntu:14.04*"
-l gpu=1,hfi=1,hosts=1
-xd ‘--device=/dev/gpu${gpu(0)}:/dev/gpu,
--device=/dev/hfi${hfi(0)}:/dev/hfi’
-xd ‘--hostname ${hosts(0)}’
-binding one_socket_balanced:4
-pe rr 4 jobscript.sh
No matter the host-OS, the application
gets whatever OS it needs (if they run
their own docker-repo, the image can
even be prepared however they need it)
Each PE-task will get 1
GPU and 1 HFI device
(both with the same ID,
i.e. in the same “location”)
and a unique hostname
No matter which devices
are granted, the
application only sees
/dev/gpu and /dev/hfi
inside the container and
can use them directly
without any performance
penalty!
Even if the RSMAP would occupy 7 cores
per GPU, we only want 4 per PE-task.
Thus leaving room for other jobs, which do
not need a GPU or HFI. Also, we only go
on one socket per host.
Container gets a unique, known (!) hostname
www.univa.com
8

More Related Content

What's hot

In-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several timesIn-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several times
Aleksander Alekseev
 

What's hot (19)

Deccan RubyConf 2016 - Lighning Talk - SpiceRub
Deccan RubyConf 2016 - Lighning Talk - SpiceRubDeccan RubyConf 2016 - Lighning Talk - SpiceRub
Deccan RubyConf 2016 - Lighning Talk - SpiceRub
 
Glusterfs session #2 1 layer above disk filesystems
Glusterfs session #2   1 layer above disk filesystemsGlusterfs session #2   1 layer above disk filesystems
Glusterfs session #2 1 layer above disk filesystems
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Target
 
Hydra
HydraHydra
Hydra
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroups
 
High Performance OSM Data Manipulation With Osmium - State of the Map 2013
High Performance OSM Data Manipulation With Osmium - State of the Map 2013High Performance OSM Data Manipulation With Osmium - State of the Map 2013
High Performance OSM Data Manipulation With Osmium - State of the Map 2013
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Distributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUDistributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBU
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
UBD (LaserVault Universal Backup Device )
UBD (LaserVault Universal Backup Device )UBD (LaserVault Universal Backup Device )
UBD (LaserVault Universal Backup Device )
 
GPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsGPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application Models
 
Cassandra4hadoop
Cassandra4hadoopCassandra4hadoop
Cassandra4hadoop
 
In-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several timesIn-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several times
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
 
Integrating openSUSE Ceph Block Device & OpenStack
Integrating openSUSE Ceph Block Device & OpenStack Integrating openSUSE Ceph Block Device & OpenStack
Integrating openSUSE Ceph Block Device & OpenStack
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
STOR2RRD presentation from Common CZ/SK 2015
STOR2RRD presentation from Common CZ/SK 2015STOR2RRD presentation from Common CZ/SK 2015
STOR2RRD presentation from Common CZ/SK 2015
 
Rear
RearRear
Rear
 
Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"Alexander Ignatyev "MapReduce infrastructure"
Alexander Ignatyev "MapReduce infrastructure"
 

Similar to Managing Containerized HPC and AI Workloads on TSUBAME3.0

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 

Similar to Managing Containerized HPC and AI Workloads on TSUBAME3.0 (20)

Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Docker Intro at the Google Developer Group and Google Cloud Platform Meet Up
Docker Intro at the Google Developer Group and Google Cloud Platform Meet UpDocker Intro at the Google Developer Group and Google Cloud Platform Meet Up
Docker Intro at the Google Developer Group and Google Cloud Platform Meet Up
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
 
NFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center OperationsNFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center Operations
 
DCSF19 Hardening Docker daemon with Rootless mode
DCSF19 Hardening Docker daemon with Rootless modeDCSF19 Hardening Docker daemon with Rootless mode
DCSF19 Hardening Docker daemon with Rootless mode
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
Introduction to Docker (and a bit more) at LSPE meetup SunnyvaleIntroduction to Docker (and a bit more) at LSPE meetup Sunnyvale
Introduction to Docker (and a bit more) at LSPE meetup Sunnyvale
 

More from Ian Lumb

Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Ian Lumb
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Ian Lumb
 
Docker 101 - all about Docker containers
Docker 101 - all about Docker containers Docker 101 - all about Docker containers
Docker 101 - all about Docker containers
Ian Lumb
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Ian Lumb
 

More from Ian Lumb (12)

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesTowards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
 
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro ServiceDrilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
 
Docker 101 - all about Docker containers
Docker 101 - all about Docker containers Docker 101 - all about Docker containers
Docker 101 - all about Docker containers
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
VoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkVoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache Spark
 
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster Manager
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 

Managing Containerized HPC and AI Workloads on TSUBAME3.0

  • 1. www.univa.com www.univa.com Ian Lumb Solutions Architect SUSE, Booth #1681 SC17, Denver, CO Managing Containerized HPC and AI Workloads on TSUBAME3.0
  • 4. www.univa.com Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 4 TSUBAME 3.0 - Compute Node Overview A compute-node: ■ 256 GB DDR4 RAM ■ 2 TB SSDs ■ 2x 14 cores ■ 4x GPUs ■ 4x HFI (1000 Gbps) ⇒ This is what they call a “fat compute node”
  • 5. www.univa.com Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 5 TSUBAME 3.0 - The Challenges 12.2 PetaFLOPS within only 20 racks or 540 compute nodes ➢ It is the smallest >10 PFLOPS machine in the world ➢ Wasted/unreachable resources (parts of a node) have a much bigger impact on such a “small” cluster ➢ Performance is also highly dependent on the job- placement due to additional resources, such as GPUs and HFI-devices (the closer, the better) ➢ It needs smart and flexible partitioning to ensure a high utilization
  • 6. www.univa.com 6 TSUBAME 3.0 - UGE Enhancements ▪ Core Bindings ▪ Enhanced PE support and strategies ▪ RSMAPS ▪ Enhanced PE support and chaining ▪ Docker ▪ Define unique but known container hostnames ▪ Configure Infiniband device in the container ▪ Map all job users into the container ▪ Provide execution host and Docker container hostnames to the job
  • 7. www.univa.com Copyright © Univa Corporation, 2017. All Rights Reserved. Internal Use Only. 7 Putting it all together … qsub -l docker,docker_images="*ubuntu:14.04*" -l gpu=1,hfi=1,hosts=1 -xd ‘--device=/dev/gpu${gpu(0)}:/dev/gpu, --device=/dev/hfi${hfi(0)}:/dev/hfi’ -xd ‘--hostname ${hosts(0)}’ -binding one_socket_balanced:4 -pe rr 4 jobscript.sh No matter the host-OS, the application gets whatever OS it needs (if they run their own docker-repo, the image can even be prepared however they need it) Each PE-task will get 1 GPU and 1 HFI device (both with the same ID, i.e. in the same “location”) and a unique hostname No matter which devices are granted, the application only sees /dev/gpu and /dev/hfi inside the container and can use them directly without any performance penalty! Even if the RSMAP would occupy 7 cores per GPU, we only want 4 per PE-task. Thus leaving room for other jobs, which do not need a GPU or HFI. Also, we only go on one socket per host. Container gets a unique, known (!) hostname