SlideShare a Scribd company logo
The Why and How of HPC-Cloud Hybrids
with OpenStack
OpenStack Australia Day Melbourne
June, 2017
Lev Lafayette, HPC Support and Training Officer, University of Melbourne
lev.lafayette@unimelb.edu.au
1.0 Management Layer
1.1 HPC for performance
High-performance computing (HPC) is any computer system whose architecture allows for
above average performance. Main use case refers to compute clusters with a teamster
separation between head node and workers nodes and a high-speed interconnect acts as a
single system.
1.2 Clouds for flexibility.
Precursor with virtualised hardware. Cloud VMs always have lower performance than HPC.
Question is whether the flexibility is worth the overhead.
1.3 Hybrid HPC/Clouds.
University of Melbourne model, "the chimera". Cloud VMs deployed as HPC nodes and the
Freiburg University Model, "the cyborg", HPC nodes deploying Cloud VMs.
1.4 Reviewing user preferences and usage.
Users always want more of 'x'; real issue identified was queue times. Usage indicated a high
proportion of single-node jobs.
1.5 Review and Architecture.
Review discussed whether UoM needed HPC; architecture was to use existing NeCTAR
Research cloud with an expansion of general cloud compute provisioning and use of a smaller
"true HPC" system on bare metal nodes.
2.0 Physical Layer
2.1 Physical Partitions.
"Real" HPC is a mere c276 cores, 21 GB per core. 2 socket Intel E5-2643 v3 E5-2643,
3.4GHz CPU with 6-core per socket, 192GB memory, 2x 1.2TB SAS drives, 2x 40GbE
network. “Cloud” partitions is almost 400 virtual machines with over 3,000 2.3GHz Haswell
cores with 8GB per core and . There is also a GPU partition with Dual Nvidia Tesla K80s (big
expansion this year), and departmental partitions (water and ashley). Management and login
nodes are VMs as is I/O for transferring data.
2.2 Network.
System network includes: Cloud nodes Cisco Nexus 10Gbe TCP/IP 60 usec latency (mpi-
pingpong); Bare Metal Mellanox 2100 Cumulos Linux 40Gbe TCP/IP 6.85 usec latency and
then RDMA Ethernet 1.15 usec latency
2.3 Storage.
Mountpoints to home, projects (/project /home for user data & scripts, NetApp SAS aggregate
70TB usable) and applications directories across all nodes. Additional mountpoins to VicNode
Aspera Shares. Applications directory currently on management node, needs to be decoupled.
Bare metal nodes have /scratch shared storage for MPI jobs (Dell R730 with 14 x 800GB
mixed use SSDs providing 8TB of usable storage, NFS over RDMA)., /var/local/tmp for single
node jobs, pcie SSD 1.6TB.
3.0 Operating System and
Scheduler Layer
3.1 Red Hat Linux.
Scalable FOSS operating system, high performance, very well suited for research
applications. In November 2016 of the Top 500 Supercomputers worldwide, every single
machine used a "UNIX-like" operating system; and 99.6% used Linux.
3.2 Slurm Workload Manager.
Job schedulers and resource managers allow for unattended background tasks expressed as
batch jobs among the available resources; allows multicore, multinode, arrays, dependencies,
and interactive submissions. The scheduler provides for paramterisation of computer
resources, an automatic submission of execution tasks, and a notification system for incidents.
Slurm (originally Simple Linux Utility for Resource Management), developed by Lawrence
Livermore et al., is FOSS, and used by majority of world's top systems. Scalable, offers many
optional plugins, power-saving features, accounting features, etc. Divided into logical partitions
which correlate with hardware partitions.
3.3 Git, Gerrit, and Puppet.
Version control, paired systems administration, configuration management.
3.4 OpenStack Node Deployment.
Significant use of Nova (compute) service for provisioning and decommissioning of virtual
machines on demand.
4.0 Application Layer
4.1 Source Code and EasyBuild.
Source code provides better control over security updates, integration, development, and
much better performance. Absolutely essential for reproducibility in research environment.
EasyBuild makes source software installs easier with scripts containing specified compilation
blocks (e.g., configuremake, cmake etc) and specified toolchains (GCC, Intel etc) and
environment modules (LMod). Modulefiles allow for dynamic changes to a user's environment
and ease with multiple versions of software applications on a system.
4.2 Compilers, Scripting Languages, and Applications.
Usual range of suspects; Intel and GCC, for compilers (and a little bit of PGI), Python Ruby,
and Perl for scripting languages, OpenMPI wrappers. Major applications include: MATLAB,
Gaussian, NAMD, R, OpenFOAM, Octave etc.
Almost 1,000 applications/versions installed from source, plus packages.
4.3 Containers with Singularity.
A container in a cloud virtual machine on an HPC! Wait, what?
5.0 User Layer
5.1 Karaage.
Spartan uses its own LDAP authentication that is tied to the university Security Assertion
Markup Language (SAML). Users on Spartan must belong to a project. Projects must be led
by a University of Melbourne researcher (the "Principal Investigator") and are subject to
approval by the Head of Research Compute Services. Participants in a project can be
researchers or research support staff from anywhere. Karaage is Django-based application for
user, project, and cluster reporting and management.
5.2 Freshdesk.
OMG Users!
5.3 Online Instructions and Training.
Many users (even post-doctoral researchers) require basic training in Linux command line, a
requisite skill for HPC use. Extensive training programme for researchers available using
andragogical methods, including day-long courses in “Introduction to Linux and HPC Using
Spartan”, “Linux Shell Scripting for High Performance Computing”, and “Parallel Programming
On Spartan”.
Documentation online (Github, Website, and man pages) and plenty of Slurm examples on
system.
6.0 Future Developments
6.1 Cloudbursting with Azure.
Slurm allows cloudbursting via the powersave feature; successfully experiments (and bug
discovery) within the NeCTAR research cloud.
About to add Azure through same login node. Does not mount applications directory; wrap
necessary data for transfer in script.
6.2 GPU Expansion.
Plans for a significant increase in the GPU allocation.
6.3 Test cluster (Thespian).
Everyone has a test environment, some people also have a production and a test
environment.
Test nodes already exist for Cloud and Physical partitions. Replicate management and login
nodes.
6.3 New Architectures
New architectures can be added to the system with separate build node (another VM) and with
software built for that architecture. Don't need an entirely new system.
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, University of Melbourne

More Related Content

What's hot

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_TechnologiesManish Chopra
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
inside-BigData.com
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 
Application layer
Application layerApplication layer
Application layer
Neha Kurale
 
Cluster Computing Seminar.
Cluster Computing Seminar.Cluster Computing Seminar.
Cluster Computing Seminar.
Balvant Biradar
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
Eric Van Hensbergen
 
EAS Data Flow lessons learnt
EAS Data Flow lessons learntEAS Data Flow lessons learnt
EAS Data Flow lessons learnteuc-dm-test
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
Beowulf cluster
Beowulf clusterBeowulf cluster
Beowulf cluster
Yash Karanke
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
Intel IT Center
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
Ashraful Hoda
 
Enea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux WhitepaperEnea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux Whitepaper
Enea Software AB
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
inside-BigData.com
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
CloudLightning
 
tittle
tittletittle
tittle
uvolodia
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
laurabeckcahoon
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
Peter Bryzgalov
 

What's hot (20)

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
 
Application layer
Application layerApplication layer
Application layer
 
Cluster Computing Seminar.
Cluster Computing Seminar.Cluster Computing Seminar.
Cluster Computing Seminar.
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
EAS Data Flow lessons learnt
EAS Data Flow lessons learntEAS Data Flow lessons learnt
EAS Data Flow lessons learnt
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Beowulf cluster
Beowulf clusterBeowulf cluster
Beowulf cluster
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
Enea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux WhitepaperEnea Enabling Real-Time in Linux Whitepaper
Enea Enabling Real-Time in Linux Whitepaper
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
tittle
tittletittle
tittle
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 

Similar to The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, University of Melbourne

Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
inside-BigData.com
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administrationAshish Sharma
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Waqar Sheikh
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
Intel® Software
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
MohmdUmer
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
Akash M Shah
 
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
Dr. Thippeswamy S.
 
Rha cluster suite wppdf
Rha cluster suite wppdfRha cluster suite wppdf
Rha cluster suite wppdf
projectmgmt456
 
Zou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 ConciseZou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 Concise
yongqiangzou
 
M0339_v1_6977127809 (1).pptx
M0339_v1_6977127809 (1).pptxM0339_v1_6977127809 (1).pptx
M0339_v1_6977127809 (1).pptx
viveknagle4
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
Ryousei Takano
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
Beyond static configuration
Beyond static configurationBeyond static configuration
Beyond static configuration
Stefan Schimanski
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SAMeh Zaghloul
 

Similar to The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, University of Melbourne (20)

Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administration
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
 
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
 
Rha cluster suite wppdf
Rha cluster suite wppdfRha cluster suite wppdf
Rha cluster suite wppdf
 
Zou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 ConciseZou Layered VO PDCAT2008 V0.5 Concise
Zou Layered VO PDCAT2008 V0.5 Concise
 
M0339_v1_6977127809 (1).pptx
M0339_v1_6977127809 (1).pptxM0339_v1_6977127809 (1).pptx
M0339_v1_6977127809 (1).pptx
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
Beyond static configuration
Beyond static configurationBeyond static configuration
Beyond static configuration
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 

More from OpenStack

Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, AptiraSwinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
OpenStack
 
Related OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera SoftwareRelated OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera Software
OpenStack
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPC
OpenStack
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research Cloud
OpenStack
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
OpenStack
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
OpenStack
 
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, OracleMigrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
OpenStack
 
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
OpenStack
 
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasEnabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
OpenStack
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
OpenStack
 
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack
 
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
OpenStack
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
OpenStack
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
OpenStack
 
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
OpenStack
 
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected JourneyTraditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
OpenStack
 
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash UniversityBuilding a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
OpenStack
 
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
OpenStack
 
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStackContainers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
OpenStack
 

More from OpenStack (20)

Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, AptiraSwinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
 
Related OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera SoftwareRelated OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera Software
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPC
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research Cloud
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
 
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, OracleMigrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
 
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
 
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasEnabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
 
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
 
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
 
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected JourneyTraditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
 
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash UniversityBuilding a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
 
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
 
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStackContainers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, University of Melbourne

  • 1. The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au
  • 2. 1.0 Management Layer 1.1 HPC for performance High-performance computing (HPC) is any computer system whose architecture allows for above average performance. Main use case refers to compute clusters with a teamster separation between head node and workers nodes and a high-speed interconnect acts as a single system. 1.2 Clouds for flexibility. Precursor with virtualised hardware. Cloud VMs always have lower performance than HPC. Question is whether the flexibility is worth the overhead. 1.3 Hybrid HPC/Clouds. University of Melbourne model, "the chimera". Cloud VMs deployed as HPC nodes and the Freiburg University Model, "the cyborg", HPC nodes deploying Cloud VMs. 1.4 Reviewing user preferences and usage. Users always want more of 'x'; real issue identified was queue times. Usage indicated a high proportion of single-node jobs. 1.5 Review and Architecture. Review discussed whether UoM needed HPC; architecture was to use existing NeCTAR Research cloud with an expansion of general cloud compute provisioning and use of a smaller "true HPC" system on bare metal nodes.
  • 3. 2.0 Physical Layer 2.1 Physical Partitions. "Real" HPC is a mere c276 cores, 21 GB per core. 2 socket Intel E5-2643 v3 E5-2643, 3.4GHz CPU with 6-core per socket, 192GB memory, 2x 1.2TB SAS drives, 2x 40GbE network. “Cloud” partitions is almost 400 virtual machines with over 3,000 2.3GHz Haswell cores with 8GB per core and . There is also a GPU partition with Dual Nvidia Tesla K80s (big expansion this year), and departmental partitions (water and ashley). Management and login nodes are VMs as is I/O for transferring data. 2.2 Network. System network includes: Cloud nodes Cisco Nexus 10Gbe TCP/IP 60 usec latency (mpi- pingpong); Bare Metal Mellanox 2100 Cumulos Linux 40Gbe TCP/IP 6.85 usec latency and then RDMA Ethernet 1.15 usec latency 2.3 Storage. Mountpoints to home, projects (/project /home for user data & scripts, NetApp SAS aggregate 70TB usable) and applications directories across all nodes. Additional mountpoins to VicNode Aspera Shares. Applications directory currently on management node, needs to be decoupled. Bare metal nodes have /scratch shared storage for MPI jobs (Dell R730 with 14 x 800GB mixed use SSDs providing 8TB of usable storage, NFS over RDMA)., /var/local/tmp for single node jobs, pcie SSD 1.6TB.
  • 4. 3.0 Operating System and Scheduler Layer 3.1 Red Hat Linux. Scalable FOSS operating system, high performance, very well suited for research applications. In November 2016 of the Top 500 Supercomputers worldwide, every single machine used a "UNIX-like" operating system; and 99.6% used Linux. 3.2 Slurm Workload Manager. Job schedulers and resource managers allow for unattended background tasks expressed as batch jobs among the available resources; allows multicore, multinode, arrays, dependencies, and interactive submissions. The scheduler provides for paramterisation of computer resources, an automatic submission of execution tasks, and a notification system for incidents. Slurm (originally Simple Linux Utility for Resource Management), developed by Lawrence Livermore et al., is FOSS, and used by majority of world's top systems. Scalable, offers many optional plugins, power-saving features, accounting features, etc. Divided into logical partitions which correlate with hardware partitions. 3.3 Git, Gerrit, and Puppet. Version control, paired systems administration, configuration management. 3.4 OpenStack Node Deployment. Significant use of Nova (compute) service for provisioning and decommissioning of virtual machines on demand.
  • 5. 4.0 Application Layer 4.1 Source Code and EasyBuild. Source code provides better control over security updates, integration, development, and much better performance. Absolutely essential for reproducibility in research environment. EasyBuild makes source software installs easier with scripts containing specified compilation blocks (e.g., configuremake, cmake etc) and specified toolchains (GCC, Intel etc) and environment modules (LMod). Modulefiles allow for dynamic changes to a user's environment and ease with multiple versions of software applications on a system. 4.2 Compilers, Scripting Languages, and Applications. Usual range of suspects; Intel and GCC, for compilers (and a little bit of PGI), Python Ruby, and Perl for scripting languages, OpenMPI wrappers. Major applications include: MATLAB, Gaussian, NAMD, R, OpenFOAM, Octave etc. Almost 1,000 applications/versions installed from source, plus packages. 4.3 Containers with Singularity. A container in a cloud virtual machine on an HPC! Wait, what?
  • 6. 5.0 User Layer 5.1 Karaage. Spartan uses its own LDAP authentication that is tied to the university Security Assertion Markup Language (SAML). Users on Spartan must belong to a project. Projects must be led by a University of Melbourne researcher (the "Principal Investigator") and are subject to approval by the Head of Research Compute Services. Participants in a project can be researchers or research support staff from anywhere. Karaage is Django-based application for user, project, and cluster reporting and management. 5.2 Freshdesk. OMG Users! 5.3 Online Instructions and Training. Many users (even post-doctoral researchers) require basic training in Linux command line, a requisite skill for HPC use. Extensive training programme for researchers available using andragogical methods, including day-long courses in “Introduction to Linux and HPC Using Spartan”, “Linux Shell Scripting for High Performance Computing”, and “Parallel Programming On Spartan”. Documentation online (Github, Website, and man pages) and plenty of Slurm examples on system.
  • 7. 6.0 Future Developments 6.1 Cloudbursting with Azure. Slurm allows cloudbursting via the powersave feature; successfully experiments (and bug discovery) within the NeCTAR research cloud. About to add Azure through same login node. Does not mount applications directory; wrap necessary data for transfer in script. 6.2 GPU Expansion. Plans for a significant increase in the GPU allocation. 6.3 Test cluster (Thespian). Everyone has a test environment, some people also have a production and a test environment. Test nodes already exist for Cloud and Physical partitions. Replicate management and login nodes. 6.3 New Architectures New architectures can be added to the system with separate build node (another VM) and with software built for that architecture. Don't need an entirely new system.