18-20 MAY 2021
BOV1021
Jeff Reser, jeff.reser@suse.com
5 Paths to High
Performance Computing
Copyright © SUSE 2021
Edge Path
Containers Path
Cloud Path
Enterprise Path
Supercomputing Path
Agenda
3
7
15
19
24
28
Copyright © SUSE 2021
HPC
Infrastructure
Best Practices
Understand your HPDA workloads,
requirements and data locations
Complexity: “Composing a working HPC
environment is difficult, time-consuming,
requiring experts.”
Evaluate container usage across
environments
Time to Solution: “I need to maximize
application performance, scale workloads and
minimize overhead.”
Consider hybrid environments – lower
costs and avoid HPC cloud lock-in
Maintenance: “My IT staff doesn’t have time to
update and test all the different software
components.”
4
Copyright © SUSE 2021
• HPC ROI is very high - $507
revenue per dollar; $47 average
profit (or cost savings) per dollar
invested in HPC1
• Worldwide HPC revenue
expected to reach over $19 billion
by 20241 (CAGR 6.8%, down from
8.7% due to Covid impacts)
• The exascale race will drive new
technologies
• Big data combined with HPC
creating new solutions, adding
many new buyers to the HPC
space
• Growing cloud-based HPC
services with demand for faster
answers to complex problems
• Storage systems will increasingly
become more critical
• Cloud computing for HPC
workloads will grow faster
• AI/ML will grow faster than
everything else
HPC Market
5
1 HyperionResearch,November2020
Copyright © SUSE 2021
Data Science to Business Results
Practical HPC
6
Data
Data Analytics
Data
Science &
Modeling
AI/ML/DL
Predictive
Modeling
Inference and Practical AI
7
Copyright © SUSE 2021
Supercomputing
IHV-driven (HPE/Cray/SGI, Fujitsu, Dell, Bull)
International research labs (NCAR/NOAA,
Kongsberg, ICHEC)
Academia (U. of Bristol, U. of Leicester)
High ROI
Copyright © SUSE 2021
Commercial OS share in Top500
(represents 107 supercomputers in
the list) shows a SUSE/CLE majority
of 41%
The rest are running either
unspecified “Linux”, TOSS or
CentOS2
8
Supercomputer HPC Market
2 Top500 SupercomputerReport, November2020
SUSE/CLE 41%
Ubuntu 10%
Bullx 17%
RHEL 32%
Commercial OS Share in Top500
(represents 107 supercomputers in the list)
Copyright © SUSE 2021
Analysis of top 200
supercomputers’ vendor-supported
OS usage reveals that SUSE/CLE
has the majority2
Other non-commercial OS include
CentOS (19%), “Linux” (35%),
TOSS (3%), VEOS (1%) and others
(Sunway, Kylin, Scientific)
CentOS & TOSS gain share in
“smaller” supercomputers2
9
Supercomputer HPC Market
2 Top500 SupercomputerReport, November2020
SUSE/CLE 19%
Ubuntu 5%
RHEL 13%
Commercial OS Share in Top 200
(represents top 200 supercomputers in the list)
Bullx 5%
Copyright © SUSE 2021
NVIDIA announced its first data center CPU, an
Arm-based processor that will deliver 10x the
performance of today’s fastest servers on the
most complex AI and HPC workloads
Catalyst UK established largest Arm-based
supercomputer deployments in the world – more
than 12,000 Arm-based cores running across
three universities with strong benchmark results
Rise of Arm-based
Supercomputing
Copyright © SUSE 2021
Exascale race driving new
technologies
11
Unprecedented set of tools to address big,
complex problems
• New approaches for drug response prediction
• Global to micro weather forecasting
• Extreme-scale cosmological simulations
• Discovering materials for the creation of more
efficient organic solar cells
Copyright © SUSE 2021
Leibniz
Supercomputing
Centre
SuperMUC-NG runs SUSE on Lenovo ThinkSystem,
6500 nodes, 26.9 PetaFlops
Geophysicists use earthquake
simulation software to investigate
seismic waves beneath Earth’s surface
Calculations involved in this kind of simulation are so
complex that they push even supercomputers to their
limits
Better prepare for future seismic events
12
Copyright © SUSE 2021
Tokyo
Institute of
Technology
TSUBAME touted as the “supercomputer for everyone”
• Medicine and simulating human organs
• Predicting traffic congestion or share prices
• Earthquake warnings and weather forecasting
• Social phenomenon analysis
• Solving the mystery of baseball trajectories
13
Copyright © SUSE 2021
NASA Asteroid
Threat
Assessment
Project
Pleiades supercomputer @ Ames
NASA’s asteroid research is shared with
scientists at universities, national labs, and
government agencies
Develop assessment and response plans to look
at damage to infrastructure, warning times,
evacuations, and other options for protecting
lives and property
15
Copyright © SUSE 2021
Enterprise
HPC infrastructure being driven into
enterprises with in-house HPC
AI/ML and other containerized applications
driving the need for stronger platforms
Tangible benefits are being realized
Copyright © SUSE 2021 16
HPC Infrastructure Is Being Driven
Into Enterprise Markets1
Competitive forces are driving companies to
aim more complex questions at their data
structures & push business operations closer
to real time
HPC moving in-house for scalability, ultrafast
data movement and very large memory
systems.
Manufacturing is the largest commercial
segment (incl. consumer goods)
Financial Services is the second largest
commercial segment
17%
25%
58%
• Financial services
• Large product manufacturing
• Bio-sciences
• Energy
• Consumer product manufacturing
• Retail
Commercial
• Chemical
• Media & Entertainment
• Electronics
• Transportation
• Other commercial
• Academic
• Not-for-profit
Academic
Government
• National agency
• National security
• National research lab
• State or local government
1 Intersect360 Research, August 2020
Copyright © SUSE 2021
Household Appliance
Design
Reduce time in the development of innovative
ovens, washing machines and refrigerators
Simulate and test more product variations in a
virtual design space before committing to a
physical prototype
Achieve higher sales, reducing manufacturing
costs and improving the brand.
17
Copyright © SUSE 2021
Manufacturing &
Materials Science
Discovery: Discovers materials faster; mine
databases for “recipes”
Analysis: Predicts right compound
combinations
Modeling: Helps refine materials for optimum
performance
18
19
Copyright © SUSE 2021
Cloud
Makes HPC resources available to scientists
around the world
HPC Bursting for on-demand processing
CrunchYard, Azure, AWS, GCP partnerships
Copyright © SUSE 2021
HPC cloud spend projected to reach ~$9B
by 20241
Covid-19 has accelerated cloud adoption
HPC in the cloud is expected to grow more
than 2.5 times faster than the on-prem
HPC server market
Major drivers include increased number of
AI/ML workloads in the cloud and
improvements in ease of use and
deployment of HPC jobs
20
HPC in the Cloud Market
2020 HPC Cloud Forecast1
1 HyperionResearch,November2020
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
2018
2019
2020
2021
2022
2023
2024
2,466
3,910
4,300
5,300
6,400
7,600
8,800
($M)
HPC
Cloud
Spend
Copyright © SUSE 2021
HPC “all-in” the cloud
• Includes the head, compute and storage nodes, with no
hardware infrastructure to maintain
• Optimized cost and performance for scale-out
applications
HPC bursting to hybrid/public clouds
• Address changing capacity needs
• Extend HPC jobs to the Cloud for on-demand scale and
flexibility
Local Network Cloud Local Network Cloud
All-in vs. Bursting
Copyright © SUSE 2021
Hyperloop
Magnetic accelerators and compressed air
bearings remove frictional forces
Design anticipates operational maintenance and
variables such as earthquakes, power outages
and passenger fluctuations
Design must be safe, reliable, affordable and
self-powered
HPC simulations enable iterative design
changes and show results
Copyright © SUSE 2021
Pharmaceuticals &
Drug Research
Experimentation: Predicts treatment results
accurately
Discovery: Improves drug design and discovery
Treatment: Enables better disease
management; enables precision medicine
23
24
Copyright © SUSE 2021
Containers
Kubernetes lacks high-performance
scheduling capabilities
HPC Container platforms (Rancher,
Singularity, UberCloud)
Data Scientists rely on containers across
environments
Copyright © SUSE 2021
What Makes a Good
Container
Solution for HPC?
• Compatible with HPC environments
• Designed for performance (support for
accelerated GPUs)
• Works with many resource managers
(Slurm, Altair, Bright Computing, Univa)
• GUI support for managing jobs and
services
• Secure and compliant
25
Copyright © SUSE 2021
Simplify access to
accelerator
technology
OpenACC is a directive-based
programming model designed to provide
performance and portability for CPUs,
GPUs and other accelerators
Copyright © SUSE 2021
Leading Producer
of DNA
Sequencers
Market leader in genetic sequencing, Illumina’s
research drives work on disease, drug reaction and
agriculture
Provided a migration path for a system with 25K HPC
computing cores and 20 petabytes of storage
Traditional HPC distributes compute over many nodes,
smaller number of large files with data transfer between
multiple nodes
Open source container platforms are designed to be
simple, fast, and secure – optimized for HPC workloads
28
Copyright © SUSE 2021
Edge
K3ai infrastructure for edge devices and full
capability of a Kubernetes cluster
Edge HPDA workloads driving the need for
stronger platforms
Copyright © SUSE 2021
AI
Infrastructure
for Edge
• K3ai (https://github.com/kf5i/k3ai) is a lightweight, fully
automated AI infrastructure-in-a-box solution that
allows anyone to experiment quickly with Kubeflow
pipelines
• Infrastructure for edge devices with full capability of a
Kubernetes cluster
• Includes Kubernetes based on K3s from Rancher,
Kubeflow pipelines, NVIDIA GPU support, NVIDIA
Triton inference server and more
• Runs on Arm and x86
29
Copyright © SUSE 2021
Sustainable
Energy & Utilities
Diversity and increase pace of innovation while
improving infrastructure efficiencies
Increase responsiveness by aligning product
supply and achieve safe and compliant
operations
Modeling reduces cost and risk
Limit environmental impacts; optimizes
supply/demand; proactive maintenance
Enables smart allocation of energy resources
30
Copyright © SUSE 2021
Automotive Design
& Manufacturing
Design: Enables effective design simulations
Connected vehicles: Powers advanced safety
features; cloud services for available data; driver
monitoring
Manufacturing: Speeds up design, increases
quality and reduces R&D costs
31
32
Copyright © SUSE 2021
SUSE Linux
Enterprise High
Performance
Computing
Bundles popular HPC tools and libraries
All packages supported by SUSE
Available for x86-64 and Arm64
Flexible release schedule
Copyright © SUSE 2021
SUSE HPC Module
33
Base OS &
Architecture
SLE HPC 15 (HPC Module w/subscription) supports Aarch64 & x86-64
Management
Tools
SLURM (a highly scalable workload manager)
conman (the console manager)
genders (static cluster configuration database)
lua-lmod (environment module system)
munge (authentication service for user credentials)
mrsh (set of remote shell programs using munge)
pdsh (high performance, parallel remote shell utility)
prun (script-based wrapper for launching parallel jobs)
clustduct (script which glues the genders database to dnsmasq)
powerman (cluster power control)
cpuid (obtain CPU details)
I/O
Services
Memkind (heap manager for memory control)
RASDaemon (RAS reports via kernel tracing)
Parallel
Libraries
boost (portable C++ source library)
gsl (GNU Scientific Library)
FFTW3 (Fourier transforms computing library)
PETSc (suite of data structures for partial differentiated equations)
ScaLAPACK (scalable linear algebra package)
hypre (parallel solvers for sparse linear systems)
mumps (multifrontal massively parallel sparse direct solver)
Scotch (graph & mesh/hypergraph partitioning, graph clustering)
trilinos (large-scale complex multi-physics & scientific problems)
Serial
Libraries
OpenBLAS (optimized BLAS library)
Superlu (super nodal sparse direct solver)
Compilers GCC (GNU Compiler Collection includes C++ & Fortran)
Monitoring
Tools
ganglia (ganglia monitoring core)
ganglia-web (ganglia web front-end)
icinga2 (monitoring platform core)
prometheus slurm exporter (Prometheus exporter for perf metrics)
I/O
Libraries
adios (adaptable I/O system for exascale)
hdf5 (data model, library and file format for managing data)
netcdf (Unidata network Common Data Form)
netcdf-cxx4 (C++ libraries and utilities)
netcdf-fortran (netCDF Fortran libraries)
pnetcdf (parallel I/O library for NetCDF file access)
Message
Passing
Interface
openmpi3/openmpi4 (Message Passing Interface implementation)
mvapich2 (MPI over InfiniBand, Omni-Path, RoCE, iWARP)
mpich (HP portable implementation of MPI)
openPMIx (Process Management Interface Exascale standard)
Development
Tools
metis (serial graph partitioning & matrix ordering)
hwloc (hardware locality)
python-numpy (scientific computing with Python)
python-scipy (software for math, science and engineering)
Performance
Tools
mpiP (lightweight MPI profiler)
imb (Intel MPI benchmarks)
papi (Performance Application Programming Interface)
SUSE
Package
Hub
robinhood (policy engine to monitor filesystem contents)
singularity (HPC application containers)
charliecloud (lightweight user-defined HPC stack)
clustershell (scalable cluster admin Python framework)
warewulf (scalable systems management suite for HP clusters)
Copyright © SUSE 2021
Ref Architectures
34
Base OS openSUSE Leap 15.2, CentOS 8.3 SLE HPC 15
Architecture Aarch64, x86-64 Aarch64, x86-64
Administrative Tools
conman, docs, examples, genders, lmod-defaults, lmod, losf, mrsh,
nhc, pdsh, prun, test-suite
conman, genders, lua-lmod, pdsh, prun, clustduct, mrsh, cpuid,
powerman
Provisioning Tools warewulf SLE HPC
Resource Management SLURM, OpenPBS, pmix, magpie SLURM, Munge, (Altair PBS available separately)
Runtimes charliecloud, singularity OCR (Singularity & Charliecloud available via Package Hub)
I/O Services Memkind, RASDaemon Memkind, RASDaemon
Parallel Libraries
boost, gsl, FFTW, hypre, mfem, mumps, opencoarrays, petsc, scotch,
scalapack, slepc, superlu_dist, trilinos
GSL, FFTW, Hypre, MUMPS, PETSc, ScaLAPACK, Scotch, Trilinos
Serial/Threaded Libraries R, openblas, plasma, superlu OpenBLAS, SuperLU
I/O Libraries adios, HDF5, NetCDF, NetCDF-cxx, NetCDF-F, pnetcdf, sionlib HDF5 (pHDF5), NetCDF, NetCDF-cxx, NetCDF-FORTRAN
Compiler Families GNU (GCC, G++, GFORTRAN), Intel/Arm compiler compatibility GNU (GCC, G++, GFORTRAN)
MPI Runtime/Transport Families Intel MPI Compat, libfabric, mpich, mvapich2, openmpi4, ucx mvapich2, openMPI3, openMPI4, mpich, OpenPMIx
Development Tools
EasyBuild, autoconf, automake, cmake, hwloc, metis, libtool, python3-
mpi4py, python3-numpy, -scipy, spack, valgrind
Hwloc, python3-NumPy, python3-SciPy, Metis
Performance Tools
dimemas, extrae, geopm, imb, likwid, msr-safe, omb, papi, pdtoolkit,
scalasca, score-p, tau
PAPI, mpiP, imb
Lustre Packages lustre-client (Whamcloud) Whamcloud available separately
35
Copyright © SUSE 2021
SUSE delivers an HPC platform that
makes those solutions better (easier to
develop, faster and more manageable)
Businesses around the world today are
recognizing that a Linux-based HPC
infrastructure is vital to supporting the
analytics applications of tomorrow
HPC is not just for scientific research
any longer, and is being adopted across
banking, healthcare, utilities and
manufacturing
In Conclusion
36
Copyright © SUSE 2021
Neil deGrasse Tyson, astrophysicist
“I don’t want to be
the embarrassment of
the galaxy, to have
had the power to
deflect an asteroid,
and then not, and end
up going extinct.
We’d be the laughing-
stock of the aliens
of the cosmos if that
were the case.”
Thank you
© 2021 SUSE LLC. All Rights Reserved. SUSE and the SUSE logo are registered
trademarks of SUSE LLC in the United States and other countries. All third-party
trademarks are the property of their respective owners.
General Disclaimer: This document is not to be construed as a promise by any
participating company to develop, deliver, or market a product. It is not a commitment to
deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the
contents of this document, and specifically disclaims any express or implied warranties
of merchantability or fitness for any particular purpose. The development, release, and
timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to
make changes to its content, at any time, without obligation to notify any person or entity
of such revisions or changes. All SUSE marks referenced in this presentation are
trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other
countries. All third-party trademarks are the property of their respective owners.
For more information, contact SUSE at:
+1 800 796 3700 (U.S./Canada)
+49 (0)911-740 53-0 (Worldwide)
Maxfeldstrasse 5
90409 Nuremberg
www.suse.com
Copyright © SUSE 2021
Other AI/HPC
Projects
• FuseML (https://github.com/fuseml/) aims to
provide an MLOps framework that integrates the
AI/ML tools of your choice.
• Phoebe (https://github.com/SUSE/phoebe)
adds basic artificial intelligence capabilities to
the Linux OS.
• Nearly 10,000 GitHub projects for HPC
38
Copyright © SUSE 2021
Reference Architecture
High Bandwidth Fabric
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Login Node
Compute Cluster
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Login Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Login Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Login Node
Lustre metadata
Lustre metadata
Lustre Node
Lustre Node
Lustre Node
Lustre
High Performance Storage
Slurm workload management
Parallel task executor
Parallel remote commands
Power management for clusters
Monitor kernel RAS
File system library
Console access
CPU identification
HPC Module
Portable Hardware Locality
User environment management
Terminal management
Memory allocation
Remote shell programs
Authentication service
SUSE Linux Enterprise High Performance Computing
Head Node
Failover Head Node
Administration
Message Passing Interface
Process Launching and Monitoring Timers
Compute Cluster
Compute Cluster
Compute Cluster
Spectrum
Scale (GPFS)
XFS
pNFS

5 Paths to HPC - SUSE

  • 1.
    18-20 MAY 2021 BOV1021 JeffReser, jeff.reser@suse.com 5 Paths to High Performance Computing
  • 2.
    Copyright © SUSE2021 Edge Path Containers Path Cloud Path Enterprise Path Supercomputing Path Agenda 3 7 15 19 24 28
  • 3.
    Copyright © SUSE2021 HPC Infrastructure Best Practices Understand your HPDA workloads, requirements and data locations Complexity: “Composing a working HPC environment is difficult, time-consuming, requiring experts.” Evaluate container usage across environments Time to Solution: “I need to maximize application performance, scale workloads and minimize overhead.” Consider hybrid environments – lower costs and avoid HPC cloud lock-in Maintenance: “My IT staff doesn’t have time to update and test all the different software components.” 4
  • 4.
    Copyright © SUSE2021 • HPC ROI is very high - $507 revenue per dollar; $47 average profit (or cost savings) per dollar invested in HPC1 • Worldwide HPC revenue expected to reach over $19 billion by 20241 (CAGR 6.8%, down from 8.7% due to Covid impacts) • The exascale race will drive new technologies • Big data combined with HPC creating new solutions, adding many new buyers to the HPC space • Growing cloud-based HPC services with demand for faster answers to complex problems • Storage systems will increasingly become more critical • Cloud computing for HPC workloads will grow faster • AI/ML will grow faster than everything else HPC Market 5 1 HyperionResearch,November2020
  • 5.
    Copyright © SUSE2021 Data Science to Business Results Practical HPC 6 Data Data Analytics Data Science & Modeling AI/ML/DL Predictive Modeling Inference and Practical AI
  • 6.
    7 Copyright © SUSE2021 Supercomputing IHV-driven (HPE/Cray/SGI, Fujitsu, Dell, Bull) International research labs (NCAR/NOAA, Kongsberg, ICHEC) Academia (U. of Bristol, U. of Leicester) High ROI
  • 7.
    Copyright © SUSE2021 Commercial OS share in Top500 (represents 107 supercomputers in the list) shows a SUSE/CLE majority of 41% The rest are running either unspecified “Linux”, TOSS or CentOS2 8 Supercomputer HPC Market 2 Top500 SupercomputerReport, November2020 SUSE/CLE 41% Ubuntu 10% Bullx 17% RHEL 32% Commercial OS Share in Top500 (represents 107 supercomputers in the list)
  • 8.
    Copyright © SUSE2021 Analysis of top 200 supercomputers’ vendor-supported OS usage reveals that SUSE/CLE has the majority2 Other non-commercial OS include CentOS (19%), “Linux” (35%), TOSS (3%), VEOS (1%) and others (Sunway, Kylin, Scientific) CentOS & TOSS gain share in “smaller” supercomputers2 9 Supercomputer HPC Market 2 Top500 SupercomputerReport, November2020 SUSE/CLE 19% Ubuntu 5% RHEL 13% Commercial OS Share in Top 200 (represents top 200 supercomputers in the list) Bullx 5%
  • 9.
    Copyright © SUSE2021 NVIDIA announced its first data center CPU, an Arm-based processor that will deliver 10x the performance of today’s fastest servers on the most complex AI and HPC workloads Catalyst UK established largest Arm-based supercomputer deployments in the world – more than 12,000 Arm-based cores running across three universities with strong benchmark results Rise of Arm-based Supercomputing
  • 10.
    Copyright © SUSE2021 Exascale race driving new technologies 11 Unprecedented set of tools to address big, complex problems • New approaches for drug response prediction • Global to micro weather forecasting • Extreme-scale cosmological simulations • Discovering materials for the creation of more efficient organic solar cells
  • 11.
    Copyright © SUSE2021 Leibniz Supercomputing Centre SuperMUC-NG runs SUSE on Lenovo ThinkSystem, 6500 nodes, 26.9 PetaFlops Geophysicists use earthquake simulation software to investigate seismic waves beneath Earth’s surface Calculations involved in this kind of simulation are so complex that they push even supercomputers to their limits Better prepare for future seismic events 12
  • 12.
    Copyright © SUSE2021 Tokyo Institute of Technology TSUBAME touted as the “supercomputer for everyone” • Medicine and simulating human organs • Predicting traffic congestion or share prices • Earthquake warnings and weather forecasting • Social phenomenon analysis • Solving the mystery of baseball trajectories 13
  • 13.
    Copyright © SUSE2021 NASA Asteroid Threat Assessment Project Pleiades supercomputer @ Ames NASA’s asteroid research is shared with scientists at universities, national labs, and government agencies Develop assessment and response plans to look at damage to infrastructure, warning times, evacuations, and other options for protecting lives and property
  • 14.
    15 Copyright © SUSE2021 Enterprise HPC infrastructure being driven into enterprises with in-house HPC AI/ML and other containerized applications driving the need for stronger platforms Tangible benefits are being realized
  • 15.
    Copyright © SUSE2021 16 HPC Infrastructure Is Being Driven Into Enterprise Markets1 Competitive forces are driving companies to aim more complex questions at their data structures & push business operations closer to real time HPC moving in-house for scalability, ultrafast data movement and very large memory systems. Manufacturing is the largest commercial segment (incl. consumer goods) Financial Services is the second largest commercial segment 17% 25% 58% • Financial services • Large product manufacturing • Bio-sciences • Energy • Consumer product manufacturing • Retail Commercial • Chemical • Media & Entertainment • Electronics • Transportation • Other commercial • Academic • Not-for-profit Academic Government • National agency • National security • National research lab • State or local government 1 Intersect360 Research, August 2020
  • 16.
    Copyright © SUSE2021 Household Appliance Design Reduce time in the development of innovative ovens, washing machines and refrigerators Simulate and test more product variations in a virtual design space before committing to a physical prototype Achieve higher sales, reducing manufacturing costs and improving the brand. 17
  • 17.
    Copyright © SUSE2021 Manufacturing & Materials Science Discovery: Discovers materials faster; mine databases for “recipes” Analysis: Predicts right compound combinations Modeling: Helps refine materials for optimum performance 18
  • 18.
    19 Copyright © SUSE2021 Cloud Makes HPC resources available to scientists around the world HPC Bursting for on-demand processing CrunchYard, Azure, AWS, GCP partnerships
  • 19.
    Copyright © SUSE2021 HPC cloud spend projected to reach ~$9B by 20241 Covid-19 has accelerated cloud adoption HPC in the cloud is expected to grow more than 2.5 times faster than the on-prem HPC server market Major drivers include increased number of AI/ML workloads in the cloud and improvements in ease of use and deployment of HPC jobs 20 HPC in the Cloud Market 2020 HPC Cloud Forecast1 1 HyperionResearch,November2020 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 2018 2019 2020 2021 2022 2023 2024 2,466 3,910 4,300 5,300 6,400 7,600 8,800 ($M) HPC Cloud Spend
  • 20.
    Copyright © SUSE2021 HPC “all-in” the cloud • Includes the head, compute and storage nodes, with no hardware infrastructure to maintain • Optimized cost and performance for scale-out applications HPC bursting to hybrid/public clouds • Address changing capacity needs • Extend HPC jobs to the Cloud for on-demand scale and flexibility Local Network Cloud Local Network Cloud All-in vs. Bursting
  • 21.
    Copyright © SUSE2021 Hyperloop Magnetic accelerators and compressed air bearings remove frictional forces Design anticipates operational maintenance and variables such as earthquakes, power outages and passenger fluctuations Design must be safe, reliable, affordable and self-powered HPC simulations enable iterative design changes and show results
  • 22.
    Copyright © SUSE2021 Pharmaceuticals & Drug Research Experimentation: Predicts treatment results accurately Discovery: Improves drug design and discovery Treatment: Enables better disease management; enables precision medicine 23
  • 23.
    24 Copyright © SUSE2021 Containers Kubernetes lacks high-performance scheduling capabilities HPC Container platforms (Rancher, Singularity, UberCloud) Data Scientists rely on containers across environments
  • 24.
    Copyright © SUSE2021 What Makes a Good Container Solution for HPC? • Compatible with HPC environments • Designed for performance (support for accelerated GPUs) • Works with many resource managers (Slurm, Altair, Bright Computing, Univa) • GUI support for managing jobs and services • Secure and compliant 25
  • 25.
    Copyright © SUSE2021 Simplify access to accelerator technology OpenACC is a directive-based programming model designed to provide performance and portability for CPUs, GPUs and other accelerators
  • 26.
    Copyright © SUSE2021 Leading Producer of DNA Sequencers Market leader in genetic sequencing, Illumina’s research drives work on disease, drug reaction and agriculture Provided a migration path for a system with 25K HPC computing cores and 20 petabytes of storage Traditional HPC distributes compute over many nodes, smaller number of large files with data transfer between multiple nodes Open source container platforms are designed to be simple, fast, and secure – optimized for HPC workloads
  • 27.
    28 Copyright © SUSE2021 Edge K3ai infrastructure for edge devices and full capability of a Kubernetes cluster Edge HPDA workloads driving the need for stronger platforms
  • 28.
    Copyright © SUSE2021 AI Infrastructure for Edge • K3ai (https://github.com/kf5i/k3ai) is a lightweight, fully automated AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines • Infrastructure for edge devices with full capability of a Kubernetes cluster • Includes Kubernetes based on K3s from Rancher, Kubeflow pipelines, NVIDIA GPU support, NVIDIA Triton inference server and more • Runs on Arm and x86 29
  • 29.
    Copyright © SUSE2021 Sustainable Energy & Utilities Diversity and increase pace of innovation while improving infrastructure efficiencies Increase responsiveness by aligning product supply and achieve safe and compliant operations Modeling reduces cost and risk Limit environmental impacts; optimizes supply/demand; proactive maintenance Enables smart allocation of energy resources 30
  • 30.
    Copyright © SUSE2021 Automotive Design & Manufacturing Design: Enables effective design simulations Connected vehicles: Powers advanced safety features; cloud services for available data; driver monitoring Manufacturing: Speeds up design, increases quality and reduces R&D costs 31
  • 31.
    32 Copyright © SUSE2021 SUSE Linux Enterprise High Performance Computing Bundles popular HPC tools and libraries All packages supported by SUSE Available for x86-64 and Arm64 Flexible release schedule
  • 32.
    Copyright © SUSE2021 SUSE HPC Module 33 Base OS & Architecture SLE HPC 15 (HPC Module w/subscription) supports Aarch64 & x86-64 Management Tools SLURM (a highly scalable workload manager) conman (the console manager) genders (static cluster configuration database) lua-lmod (environment module system) munge (authentication service for user credentials) mrsh (set of remote shell programs using munge) pdsh (high performance, parallel remote shell utility) prun (script-based wrapper for launching parallel jobs) clustduct (script which glues the genders database to dnsmasq) powerman (cluster power control) cpuid (obtain CPU details) I/O Services Memkind (heap manager for memory control) RASDaemon (RAS reports via kernel tracing) Parallel Libraries boost (portable C++ source library) gsl (GNU Scientific Library) FFTW3 (Fourier transforms computing library) PETSc (suite of data structures for partial differentiated equations) ScaLAPACK (scalable linear algebra package) hypre (parallel solvers for sparse linear systems) mumps (multifrontal massively parallel sparse direct solver) Scotch (graph & mesh/hypergraph partitioning, graph clustering) trilinos (large-scale complex multi-physics & scientific problems) Serial Libraries OpenBLAS (optimized BLAS library) Superlu (super nodal sparse direct solver) Compilers GCC (GNU Compiler Collection includes C++ & Fortran) Monitoring Tools ganglia (ganglia monitoring core) ganglia-web (ganglia web front-end) icinga2 (monitoring platform core) prometheus slurm exporter (Prometheus exporter for perf metrics) I/O Libraries adios (adaptable I/O system for exascale) hdf5 (data model, library and file format for managing data) netcdf (Unidata network Common Data Form) netcdf-cxx4 (C++ libraries and utilities) netcdf-fortran (netCDF Fortran libraries) pnetcdf (parallel I/O library for NetCDF file access) Message Passing Interface openmpi3/openmpi4 (Message Passing Interface implementation) mvapich2 (MPI over InfiniBand, Omni-Path, RoCE, iWARP) mpich (HP portable implementation of MPI) openPMIx (Process Management Interface Exascale standard) Development Tools metis (serial graph partitioning & matrix ordering) hwloc (hardware locality) python-numpy (scientific computing with Python) python-scipy (software for math, science and engineering) Performance Tools mpiP (lightweight MPI profiler) imb (Intel MPI benchmarks) papi (Performance Application Programming Interface) SUSE Package Hub robinhood (policy engine to monitor filesystem contents) singularity (HPC application containers) charliecloud (lightweight user-defined HPC stack) clustershell (scalable cluster admin Python framework) warewulf (scalable systems management suite for HP clusters)
  • 33.
    Copyright © SUSE2021 Ref Architectures 34 Base OS openSUSE Leap 15.2, CentOS 8.3 SLE HPC 15 Architecture Aarch64, x86-64 Aarch64, x86-64 Administrative Tools conman, docs, examples, genders, lmod-defaults, lmod, losf, mrsh, nhc, pdsh, prun, test-suite conman, genders, lua-lmod, pdsh, prun, clustduct, mrsh, cpuid, powerman Provisioning Tools warewulf SLE HPC Resource Management SLURM, OpenPBS, pmix, magpie SLURM, Munge, (Altair PBS available separately) Runtimes charliecloud, singularity OCR (Singularity & Charliecloud available via Package Hub) I/O Services Memkind, RASDaemon Memkind, RASDaemon Parallel Libraries boost, gsl, FFTW, hypre, mfem, mumps, opencoarrays, petsc, scotch, scalapack, slepc, superlu_dist, trilinos GSL, FFTW, Hypre, MUMPS, PETSc, ScaLAPACK, Scotch, Trilinos Serial/Threaded Libraries R, openblas, plasma, superlu OpenBLAS, SuperLU I/O Libraries adios, HDF5, NetCDF, NetCDF-cxx, NetCDF-F, pnetcdf, sionlib HDF5 (pHDF5), NetCDF, NetCDF-cxx, NetCDF-FORTRAN Compiler Families GNU (GCC, G++, GFORTRAN), Intel/Arm compiler compatibility GNU (GCC, G++, GFORTRAN) MPI Runtime/Transport Families Intel MPI Compat, libfabric, mpich, mvapich2, openmpi4, ucx mvapich2, openMPI3, openMPI4, mpich, OpenPMIx Development Tools EasyBuild, autoconf, automake, cmake, hwloc, metis, libtool, python3- mpi4py, python3-numpy, -scipy, spack, valgrind Hwloc, python3-NumPy, python3-SciPy, Metis Performance Tools dimemas, extrae, geopm, imb, likwid, msr-safe, omb, papi, pdtoolkit, scalasca, score-p, tau PAPI, mpiP, imb Lustre Packages lustre-client (Whamcloud) Whamcloud available separately
  • 34.
    35 Copyright © SUSE2021 SUSE delivers an HPC platform that makes those solutions better (easier to develop, faster and more manageable) Businesses around the world today are recognizing that a Linux-based HPC infrastructure is vital to supporting the analytics applications of tomorrow HPC is not just for scientific research any longer, and is being adopted across banking, healthcare, utilities and manufacturing In Conclusion
  • 35.
    36 Copyright © SUSE2021 Neil deGrasse Tyson, astrophysicist “I don’t want to be the embarrassment of the galaxy, to have had the power to deflect an asteroid, and then not, and end up going extinct. We’d be the laughing- stock of the aliens of the cosmos if that were the case.”
  • 36.
    Thank you © 2021SUSE LLC. All Rights Reserved. SUSE and the SUSE logo are registered trademarks of SUSE LLC in the United States and other countries. All third-party trademarks are the property of their respective owners. General Disclaimer: This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners. For more information, contact SUSE at: +1 800 796 3700 (U.S./Canada) +49 (0)911-740 53-0 (Worldwide) Maxfeldstrasse 5 90409 Nuremberg www.suse.com
  • 37.
    Copyright © SUSE2021 Other AI/HPC Projects • FuseML (https://github.com/fuseml/) aims to provide an MLOps framework that integrates the AI/ML tools of your choice. • Phoebe (https://github.com/SUSE/phoebe) adds basic artificial intelligence capabilities to the Linux OS. • Nearly 10,000 GitHub projects for HPC 38
  • 38.
    Copyright © SUSE2021 Reference Architecture High Bandwidth Fabric Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Login Node Compute Cluster Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Login Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Login Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Login Node Lustre metadata Lustre metadata Lustre Node Lustre Node Lustre Node Lustre High Performance Storage Slurm workload management Parallel task executor Parallel remote commands Power management for clusters Monitor kernel RAS File system library Console access CPU identification HPC Module Portable Hardware Locality User environment management Terminal management Memory allocation Remote shell programs Authentication service SUSE Linux Enterprise High Performance Computing Head Node Failover Head Node Administration Message Passing Interface Process Launching and Monitoring Timers Compute Cluster Compute Cluster Compute Cluster Spectrum Scale (GPFS) XFS pNFS