In this deck from the 2019 Stanford HPC Conference, Jay Kruemcke, SUSE presents: SUSE Linux for HPC - It Just Keeps Getting Better.
"SUSE has dramatically improved our HPC solutions over the past year including adding additional capabilities, longer service life and lower prices. Come to this session to understand how you can leverage SUSE Linux for HPC to build and maintain your HPC environment easier and faster."
As a member of the SUSE Linux Enterprise Server product management team, Jay is responsible for the SUSE Linux server products for High Performance Computing, 64-bit ARM systems, and SUSE Linux for IBM Power servers. Jay has built an extensive career in product management including using social media for client collaboration, product positioning, driving future product directions, and evangelizing the capabilities and future directions for dozens of enterprise products.
Learn more:
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Breaking the Kubernetes Kill Chain: Host Path Mount
SUSE Linux for HPC - It Just Keeps Getting Better
1. SUSE High Performance Computing:
It just keeps getting better
Jay Kruemcke
Sr. Product Manager, HPC, ARM, POWER
jayk@suse.com
2. The HPC universe is expanding in new ways
2
CAGR 2016-2021:
• 5.6% Supercomputer (>$500K)
• 5.0% Divisional ($250K-$500K)
• 6.3% Departmental ($100K-$250K)
• 6.3% Workgroup (<$100K)
• HPC is a growth market, with a growing
recognition of strategic value
• HPC ROI is very high
• $551 on average revenue per dollar
invested in HPC
• $52 on average profit (or cost savings) per
dollar invested in HPC
• Key use cases:
• HPC in the cloud (incl. HPCaaS)
• Cognitive computing (incl. AI/ML/DL)
• HPDA (High Performance Data Analysis)
• IoT
• Key applications:
• Modeling and simulation
• Data analytics
Source: Hyperion Research, June 2017
•
SUSE High Performance Computing2/18/2019 2
3. HPC Customer Pain Points
Complexity Maintenance Time to Solution
“My IT staff doesn’t have
time to update and test all
the different software
components.”
• Better management
software is needed, and
deployment approach
needs to be updated to
leverage HPC and cloud
infrastructure
• Stack components provided
by multiple vendors, making
it more challenging to
maintain
“I need to maximize
application performance,
scale workloads, and
minimize overhead.”
• Parallel software is lacking
with many applications
needing a major re-design
• Stack components provided
by multiple vendors, making
managing more challenging
• Segmented into commercial
and scientific, and there is
not enough collaboration
• “Composing a working
HPC environment is
difficult, time-consuming,
requiring experts.”
• Clusters are hard to use
and manage as they
become more complex in
heterogeneous
environments
• Storage access time and
data management are
becoming new bottlenecks
SUSE High Performance Computing
32/18/2019
5. SUSE is the preferred HPE partner for
Linux, HPC, OpenStack and Cloud
Foundry solutions
SUSE technology is embedded on every
HPE ProLiant Server to power the
intelligent provisioning feature
SUSE High Performance Computing2/18/2019 5
6. Arm SoC partners driving HPC adoptions in the
modern data center
Catalyst UK initiative with HPE and SUSE
HPE Apollo 70 first SUSE “Yes” certification for
an Arm server
Optimize infrastructure costs with increased
server density on latest 64-bit Arm processors
SUSE High Performance Computing2/18/2019 6
7. Goal: Propel the Arm HPC ecosystem and exascale computing in the UK
• More than 12,000 Arm-based cores running across three universities
• 64 Apollo 70 systems per site
• Two 32 core Cavium ThunderX2 processors per system
• Running SUSE Linux Enterprise for High Performance Computing
Catalyst UK project:
HPE, Arm, SUSE, and three leading UK universities establish one of
the largest Arm-based supercomputer deployments in the world
SUSE High Performance Computing
72/18/2019
8. Cray Linux Environment (CLE) is based on
SUSE Linux
Arm-powered Cray delivered to a UK
consortium
Cray has a majority share of the Top500 sites
SUSE High Performance Computing2/18/2019 8
9. Isambard – UK Tier 2 HPC service from GW4
• Cray “Scout” XC50 series system
- 10,000+ Armv8 cores – Cavium ThunderX2
- Aries interconnect
- Cray Linux Environment based on SUSE Linux
SUSE High Performance Computing2/18/2019 9
10. Scalable system framework in cooperation with
OpenHPC, designed to work for small clusters to
the largest supercomputers
Scale and balance for compute- and data-
intensive applications
Strong platform for AI and visualization
SUSE High Performance Computing2/18/2019 10
11. AI/ML/DL workloads
Jointly define scope of Lenovo HPC stack
using SUSE HPC componentry
LiCO adaptation (Lenovo Intelligent
Computing Orchestration)
Barcelona
Supercomputing
Center
2/18/2019 SUSE High Performance Computing
11
12. SuperMUC Petascale system runs SUSE
on Lenovo ThinkSystem
Geophysicists use earthquake
simulation software to investigate
seismic waves beneath Earth’s surface
Calculations involved in this kind of
simulation are so complex that they push
even supercomputers to their limits
SUSE High Performance Computing
122/18/2019
13. Bright Cluster Manager supports SUSE,
enabling customers to deploy, manage
and monitor SLES clusters using the
familiar Bright interface
Bright Cluster Manager lets users
monitor and build clusters of any size
that are easy to provision, operate,
monitor, manage and scale
SUSE High Performance Computing2/18/2019 13
14. SUSE continues to work with NVIDIA
to enable support for the latest
NVIDIA GPU cards – important in
HPC modeling and simulation
NVIDIA’s expertise in programmable
GPUs has led to breakthroughs in
parallel processing which make
supercomputing inexpensive and
widely accessible
SUSE High Performance Computing2/18/2019 14
15. Univa and SUSE together
manage containerized HPC and
AI workloads on TSUBAME 3.0
Scaling machine learning for
SUSE Linux containers,
servers, clusters and clouds
with Apache Spark and Univa
SUSE High Performance Computing2/18/2019 15
16. Altair makes HPC faster, smarter
& easy to manage with PBS Works™
Altair provides services for software
applications that streamline the
workflow management of compute-
intensive tasks including solvers,
optimization, modeling, visualization
and analytics
SUSE High Performance Computing2/18/2019 16
17. Why SUSE Linux for HPC?
• Enterprise Linux with Enterprise support
- Incidents such as Spectre and Meltdown highlight the need quick
response to address system vulnerabilities
• More than just an OS - HPC software included and supported
- SLE HPC includes popular HPC software such as slurm and OpenMPI
• Aggressively priced subscriptions
- SUSE Linux for HPC priced for large and small HPC configurations
• Proven track record in HPC
- 50% of the Top 100 are running SUSE Linux or SLES-based OS
SUSE High Performance Computing
172/18/2019
18. SUSE Linux Enterprise HPC Continuum
• SUSE Linux Enterprise for HPC (X86 and ARM)
- Fully supported by SUSE
• HPC Module (part of SUSE Linux Enterprise HPC)
- Fully supported through your SUSE HPC subscription
- Content inspired by OpenHPC
• PackageHub
- SUSE curated, community supported packages https://packagehub.suse.com/
• OpenSUSE LEAP
- Free, community supported Linux
- Free Developer subscriptions
- SUSE enablement for Azure, AWS Cloud
• Related Products
- SUSE Enterprise Storage
- SUSE Manager
- SUSE OpenStack Cloud SUSE High Performance Computing2/18/2019 18
19. SUSE Linux Enterprise Server for ARM
SUSE High Performance Computing
Offering commercial Linux support for ARM AArch64 since November 2016
20182017
SLES 15
•Now available (X86, ARM, Power,
system z)
•Bi-modal: traditional and CaaSP
•Simplified management
•Kernel 4.12
•Toolchain gcc 7+
SLES 12 for ARM (SP2)
•Initial commercial release AArch64
•SoC: Cavium, Xilinx, AMD, others
•Focus on solution enablement
•Kernel 4.4
•Toolchain gcc 6.2.1
SLES 12 HPC Module
•Supported HPC packages
•Subset of OpenHPC
•Initially includes 13 packages
slurm, pdsh, hwloc, etc.
SLES 12 for ARM (SP3)
•Second SUSE release for AArch64
•Additional SoC enablement
•Expand to early adopters
•Kernel 4.4
•Toolchain gcc 6.2.1 -> gcc 7
SLES 12 HPC
Module
•New packages mpich,hdf5,
munge, mv`apich2, numpy,
papi, openblas, openmpi,
netcdf, SCALapack, …
Q3Q1 Q2 Q4Q3Q4 Q2 Q4Q1
SUSE Enterprise Storage 5
•Ceph software defined storage
•X86 and ARM
SLES for ARM Raspberry Pi
•Commercial support focused on IoT
SLES 12 SP4
•Additional Arm enablement
SLES 12 HPC Module
•Additional HPC packages
•Nagios, adios, metis, ocr, R,
scalasc,, ….
2/18/2019 19
20. SUSE Linux Enterprise HPC offerings
• Available for X86 and Arm HPC clusters
• Extended Service Pack Overlap Support (ESPOS)
• Long Term Service Pack Support (LTSS)
• Simple, one price per cluster node
• Significantly reduced list prices
• Support for smaller cluster sizes
• New product – SLE HPC 15
- Separate from general purpose SLES
SUSE High Performance Computing
202/18/2019
21. SUSE Linux HPC Module
MUNGE
ScaLAPACK
genders
• All packages supported by
SUSE via SUSE Linux
Enterprise HPC
• Available for x86 and Arm-
based platforms
• Flexible release schedule
• SLE 12 and SLE HPC 15
SUSE High Performance Computing
212/18/2019
22. SUSE Linux Enterprise HPC Module
All packages supported by SUSE
-Support included in the SLE HPC Subscription
Easy installation via zypper or Yast
Available for X86 and ARM platforms
beginning with SLES 12 SP2
Flexible release schedule. Releases are
independent of Service Pack schedule
•Simplifying access to supported HPC software
* Note: A separate support agreement is required for Icinga2
Package HPC Module
1Q17
HPC Module
4Q17
HPC Module
1Q18
HPC
Module
SLES 12
HPC
Module
SLE HPC15
conman 0.2.7 0.2.8 0.2.8 0.2.8
cpuid (X86) 20151017 20170122 20170122 20170122 20170122
fftw 3.3.6 3.3.6 3.3.6
ganglia 3.7.2 3.7.2 3.7.2
ganglia-web 3.7.2 3.7.2 3.7.2
genders 1.2.2 1.2.2 1.2.2
GCC 6.2.1 7.3.1 7.3.1 7.3.1
hdf5 1.10.1 1.10.1 1.10.1
hwloc 1.11.5 1.11.8 1.11.8 1.11.8
Icinga2* 2.8.2 2.8.2 n/a
lua-lmod 6.5.11 7.6.1 7.6.1 7.6.1
memkind (X86) 1.1.0 1.1.0 1.6.0
mpiP 3.4.1 3.4.1 3.4.1
mrsh 2.12 2.12 2.12
munge 0.5.12 0.5.12 0.5.13
mvapich2 2.2 2.2.13 2.2.13 2.2.13
netcdf 4.4.1.1 4.4.1.1 4.6.1
netcdf-cxx 4.3.0 4.3.0 4.3.0
netcdf-fortran 4.4.4 4.4.4 4.4.4
numpy 1.13.3 1.13.3 1.14.0
openblas 0.2.20 0.2.20 0.2.20
openmpi 1.10.7 1.10.7 2.1.3
papi 5.5.1 5.5.1 5.5.1 5.5.1
pdsh 2.31 2.33 2.33 2.33 2.33
petsc 3.7.6 3.7.6 3.8.3
phdf5 1.10.1 1.10.1 1.10.1
powerman 2.3.24 2.3.24 Base OS
prun 1.0 1.0 1.0
rasdaemon 0.5.7 0.5.7 Base OS
ScaLAPACK 2.0.2 2.0.2 2.0.2
slurm 16.05.8 17.02.09 17.02.10 17.02.10 17.11.5
Note: SLE 15 customers must use the SLE HPC subscription to
access the HPC Module packages on SLE 15
SUSE High Performance Computing
222/18/2019
24. Enterprise User
SUSE PackageHub
• High-quality, up-to-date packages delivered by
openSUSE Factory
• Easy to install via zypper or yast
• Built and maintained by the community of users
• Approved and curated by SUSE
• No additional charge
•Community Supported Packages for SLES
About 1000 packages
available for X86-64
More than 500 packages
available for ARM
SUSE Package HubUpstream packages
Package Category
clustershell Administrative
robinhood Administrative
singularity Runtime
TensorFlow ML Framework
Caffe2 Coming soon
SUSE High Performance Computing
242/18/2019
26. Other HPC related SUSE Products
SUSE High Performance Computing
SUSE OpenStack Cloud
Compute nodes for Arm 64 coming
SUSE Enterprise Storage
X86-64 and Arm 64 since early 2017
SUSE Manager
Managed node for Arm 64 available
262/18/2019
27. SUSE Enterprise Storage Solution for HPC
Most Common Use Case as Tier 2 Storage
Low Latency
Storage (Lustre,
XFS, NFS etc)
HPC Compute
Cluster
SUSE Enterprise
Storage
• Use Cases:
• Primary Storage (Certain Use Cases)
• Nearline or Archival Storage
• Home Directories
• Certified with HPE Data Management Framework (DMF) and iRODS*
*: Coming Soon
SUSE High Performance Computing
272/18/2019
28. SUSE + CLE
59%
bullx
15%
Ubuntu
4%
Red Hat
22%
• Represents 116 supercomputers in
the top 500 list
• Over half of the paid Linux OS in the
top 500 are SUSE
HPC Top 500 Analysis – Paid OS System Share
SUSE High Performance Computing
282/18/2019
29. SUSE High Performance Computing
•SLES for HPC Solution
•Comprehensive range of Linux operating system offerings at multiple
price points
•Simple, one price per cluster node pricing model
•HPC Module with many supported HPC packages
•Competitive pricing
•Multiple service life options
•Full enablement for X86-64 and ARM based HPC clusters
•Additional open-source packages via PackageHub and OpenSUSE
Editor's Notes
2
But of course it’s not a perfect world and there are still some customer challenges to address. I grouped them into three categories here. Complexity includes several issues: Cluster management can be complicated, especially when dealing with a very heterogeneous environment. And this is a highly parallel processing environment – which means that the applications must be written to take advantage of the parallelism within the HPC stack. So there are complexities not only in administration and management, but complexities within the software workloads.
Maintenance points to the heterogeneous environment and the fact that many components from different vendors are likely to be involved. Ensuring that they all continue to be supported and interoperate with each other is a big concern and can be a big headache.
And time to solution is always going to put demands on how the applications run effectively – and how all of the underlying components work together. So it’s a combination of the other two categories. The biggest issue here is that parallel software is lacking with many applications – with the need sometimes for major re-designs.
What we can do at SUSE is ensure that we provide a strong open source operating system that leverages the HPC hardware and ensures that we have a cohesive, interoperable HPC stack.
In the HPC Space – customers can use SES as primary or secondary storage. Most customers today use SES as second tier storage, although we have some customers who are using CephFS as primary storage also. Primary storage is limited to smaller clusters (<250 Nodes).
In HPC space, customers are very tuned into their workloads and understand what data is needed where and when. The model here is that a customer will keep the data on SES, and load it onto their primary storage when a job needs to be run. Eg: A customer needs to run a weather simulation. They will go ahead and retrieve the data from SES (CephFS) onto Lustre/GPFS/NetApp and then run the simulation. Once the simulation is complete – they will put the results back onto SES.
Home Directories is another way to do the above. Works for most customers. One Caution: We don’t support CephFS Quotas which would be an issue for some customers. Will be supported in SES6.
Backup Storage: SES would be a good backup repo also, and used primarily by either major ISVs, or (most likely in HPC Space)
SES is also certified with HPE Data Management Framework