Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chasing the Rainbow – National Computational Infrastructure’s Pursuit of High-Performance OpenStack Cloud: Andrew Howard, NCI


Published on

Audience: Intermediate

About: With a mission to foster ambitious and aspirational research objectives, the National Computational Infrastructure (NCI) in Australia operates world-class computing services for a collaboration of Australian national research organisations and research-intensive universities.

As an increasing number of applications such as Bio-informatics leverage Big Data, and more of our virtual laboratories call for elastic, self-service provisioning and research data sharing. NCI has created platforms which combine the flexibility of OpenStack Cloud provisioning with both high speed Ethernet and high performance InfiniBand fabrics to deliver a union of compute and I/O which challenges traditional HPC performance.

In this presentation. we will discuss the various approaches we explored leading to our current implementation, share some of our performance results and examine the role that high-speed networks and fabrics play in enhancing NCI cloud performance and efficiency.

Speaker Bio: Andrew Howard – HPC and Cloud, National Computational Infrastructure

Andrew has many decades of hands-on technical, diplomatic and logistics experience covering a wide range of standard and bespoke technologies, languages and applications within Industry, Government and Academia nationally and internationally.

A fascination with computers and networking as a student lead to Andrew pioneering implementations of networks starting with installing the first Ethernet in Australia at Digital Equipment Corp in the early 80’s, the first fibre optic extended Ethernet in the mid 80’s, national converged DECnet, TCP/IP, SNA and X.25 networks in the late 80’s, ISDN, Frame relay and secure networks in the early 90’s, development and operation of one of the first Internet Service Providers in Australia in the 90’s, managing the development and delivery of the next generation Australian National Research and Education Network GrangeNet and AARNet3 networks, managing the development and operation of the Australian Government ICON fibre network and setting world speed records in International networking in the early part of this century.

Since joining the Australian National University in 2006 he has managed the evaluation, development and implementation of high speed communications systems, fibre networks and collaboration facilities. He represents the University at International Research Network groups including APAN, Internet2 and TNC and has held the positions as Co-Chair of a number of APAN Working Groups. As Co-Chair of the APAN E-Culture working group for many years he lead the production of the Dancing Q and Dancing Across Oceans Performance events.

OpenStack Australia Day - Sydney 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Chasing the Rainbow – National Computational Infrastructure’s Pursuit of High-Performance OpenStack Cloud: Andrew Howard, NCI

  1. 1. @NCInews National Computational Infrastructure’s Pursuit of High-Performance in OpenStack Clouds Andrew Howard & Matthew Sanderson HPC and Cloud Systems National Computational Infrastructure, The Australian National University
  2. 2. o NCI Contributors o Dr. Muhammad Atif o Mr. Simon Fowler o Mr. Jakub Chrzeszczyk o Dr. Ching-Ye (Leif) Lin o Dr. Benjamin Menadue Thanks to my colleagues
  3. 3. o NCI Overview o Why we are interested in HPC Clouds ? o NCI Cloud past and present o What have we done to implement a HPC Cloud o Containers o MPI Performance under Docker o Conclusion o Questions Agenda
  4. 4. NCI: an overview Mission: World-class, high-end computing services for Australian research and innovation What is NCI: • Australia’s most highly integrated e-infrastructure environment • Petascale supercomputer + highest performance research cloud + highest performance storage in the southern hemisphere • Comprehensive and integrated expert service • National/internationally renowned support team NCI is national and strategic: • Driven by national research priorities and excellence • Engaged with research institutions/collaborations and industry • A capability beyond the capacity of any single institution • Sustained by a collaboration of agencies/universities NCI is important to Australia because it: • Enables research that otherwise would be impossible • Enables delivery of world-class science • Enables interrogation of big data, otherwise impossible • Enables high-impact research that matters; informs public policy • Attracts and retains world-class researchers for Australia • Catalyses development of young researchers’ skills Research Outcomes Communities and Institutions/ Access and Services Expertise Support and Development HPC Services
 Virtual Laboratories/
 Data-intensive Services Integration Compute (HPC/Cloud) 
 Storage/Network Infrastructure
  5. 5. NCI today: comprehensive, integrated,
 quality service, innovative and valued Facts and Figures • Supercomputer (Raijin): 1.2 petaflops (1,200,000,000,000,000 operations/sec) – 57,492 cores, 160 Tbytes memory, 10 petabytes storage, 9 Tbit/sec backplane – Australia’s highest sustained performance research supercomputer • HPC Cloud: 3,200 cores, supercomputer spec. for orchestrating data services • Global integrated storage (highest performance filesystems in Australia) – 20 PB disk (up to120 Gbytes/sec b/w); 40 petabytes of tape for archive purposes • Power consumption: 1.6-2.0 megawatts • Service researchers at 30 universities, 5 national science agencies and 2 MRIs • ~2,500 research users; 1,400 journal articles supported by NCI services • Support for more than $50M of national competitive research grants annually • One-third of Fellows elected to Australian Academy of Science (2014-15) are NCI users Scale • HPC and data infrastructure: $47M replacement value (NCRIS, Aust. Gov’t) • Purpose built data centre: $24M replacement value (2012) • Recurrent operations: $17-18M p.a. (partners: $11+M; NCRIS: $5+M) – Co-investment: Science agencies ($6M p.a.), Universities and ARC ($5+M p.a.) Expert, agile and secure • 60 expert staff: operations, user support, high-performance computing and data, 
 collections management/curation, visualisation, virtual lab development, etc. • Driven by the goals of researchers and research institutions • Annual IT security audits
  6. 6. Inside the 900 sq. m. machine room
  7. 7. Supports the full gamut of research pure strategic applied industry • Fundamental sciences • Mathematics, physics, chemistry, astronomy, • ARC Centres of Excellence (ARCCSS, CAASTRO, CUDOS) • Research with an intended strategic outcome • Environmental, medical, geoscientific • e.g., energy (UNSW), food security (ANU), geosciences (Sydney) • Supporting industry and innovation • e.g., ANU/UNSW startup, Lithicon, sold for $76M to US company FEI in 2014; multinational miner • Informing public policy; real economic impact • Climate variation, next-gen weather forecasting, disaster management (CoE, BoM, CSIRO, GA)
  8. 8. Services • Services and Technologies (~30 staff) – Operations— robust/expert/secure (20 staff incl. 4 vendor contracted) – HPC • Expert user support (9) • Largest research software library in Australia (300+ applications in all fields) – Cloud • High-performance: VMs, Clusters • Secure, high-performance filesystem, integrated into NCI workflow environment – Storage • Active (high-performance Lustre parallel) and archival (dual copy HSM tape); • Partner shares; Collections; Partner dedicated • Research Engagement and Innovation (~20 staff) – HPC and Data-Intensive Innovation • Upscaling priority applications (e.g., Fujitsu-NCI collaboration on ACCESS), • Bioinformatics pipelines (APN, melanoma, human genome) – Virtual Environments • Climate/Weather, All-sky Astrophysics, Geophysics, etc. (NeCTAR) – Data Collections • Management, publication, citation— strong environmental focus + other – Visualisation • Drishti, Voluminous, Interactive presentations
  9. 9. Virtual Environments and Laboratories
  10. 10. Moving to friction-free environments, e.g virtual desktops
  11. 11. Courtesy: Geoscience Australia Shared Science Platforms for Shared Science Services
  12. 12. NCI provides user with Data as a Service User generates/ transfers data NCI provides fast data storage Data Management Portal HPC Data Curation, Publish, Citation Web based real-time analytics software, Virtual Desktop Interface, Virtual Laboratory, and other services Data Manager completes DMP and creates a catalogue Super computer users Paper and Data published Data visualisation NCI Vislab Data sharing and re-use End-to-end Data Life Cycle
  13. 13. • The Climate & Weather Science Laboratory (CWSLab) is an innovation in climate data analysis enabled by NCI via NeCTAR funding • Ideal for performing interactive analysis, code development, visualising data and publication writing • Analogous to local computer but with access to many petabytes of climate & weather data • Virtual Desktop Infrastructure established with access to climate data • Users log in to a desktop interface Earth systems & environmental science data in cloud computing
  14. 14. Cloud Infrastructure o NCI has been Cloud Computing since 2009 o RedHat OpenStack Cloud. (2013) o 384 core private cloud. o Enterprise grade. o Typically for Virtual Laboratories. o Uptime of 100% for past two years o Icehouse (2014) o Migrate nova-network to Neutron o 56G Ethernet o Ceph volume services added o Scale up from 32 nodes to 100 o Kilo (2015) o Power efficiency improvements reduce idle load from 120W to 65W o Increased overcommit ratio 14
  15. 15. o NeCTAR Research Cloud (2013 – Public Cloud). o Iaas and PaaS o Foundation node of NeCTAR (Australia’s National E-research cloud) o Intel Sandy Bridge (3200 cores with Hyper Threading). o Full Fat Tree 56G Ethernet (Mellanox) o Higher initial cost but provides consistent network performance and flexibility o 800Gb of SSDs per compute node o 2x400Gb SSDs in RAID-0 o Access to 0.5Pb of Ceph storage on the same fabric. o Delivering on-demand research computing 15 Cloud Infrastructure
  16. 16. o Tenjin Partner Cloud (2013) o Flagship Cloud for data intensive compute. o Same hardware platform as NeCTAR Cloud o Two zones: o Density (Overcommit of CPUs) o Performance (No CPU or memory overcommit) o RDO with Neutron and Centos 7.X. o Architected to support both the high Computational and I/O performance required for “big data” research. o 2x400Gb SSDs per compute node in RAID-0 (800Gb per node) o Access to ~1 Pb of Ceph storage o Access to 30 Pb of Lustre storage o SR-IOV, FFT and 56G Ethernet. o On-demand access to GPU nodes. o Federated with NCI HPC environment. Cloud Infrastructure
  17. 17. o InfiniCloud (Experimental) o FDR (56Gb) Infiniband Cloud o IceHouse then Kilo – Heavily Modified at NCI. Based on Mellanox recipe. o Virtual Functions o Mellanox InfiniBand HCA is presented into Virtual Machines via SR- IOV o InfiniBand PKey to VLAN mapping o Near line-rate IB performance o Once stable, Tenjin may move to native IB. o Containers o Docker o Rocket? Cloud Infrastructure
  18. 18. Job statistics on Raijin- Users are really into parallel jobs NCI’s Awesome dashboard Why a High Performance Cloud?
  19. 19. o Complement NCI supercomputer offerings. o Accelerate processing of single Node jobs o Virtual Laboratories. o Remote Job Submission. o Visualisation. o Serving Research data to the Web o Requiring access to Global file-system at NCI. o On-Demand GPU access. o Workloads not best suited for Lustre. o Local scratch is SSD on NCI Cloud compared to SATA HDD on Raijin. o Pipelines and workloads that are not suited for supercomputer o Packages that cannot/will not be supported. o Proof of concepts before making a big run. o Cloud burst o Offloading single node jobs to the Cloud when the supercomputer system heavily used. o Student Courses. o RDMA (using NeCTAR) Why a High Performance Cloud?
  20. 20. o Many research workloads utilise very large data sets o Secure access to data in place o Seamlessly combine resources across NCI HPC and Cloud without copying data into and out of the Cloud o Migrate workloads transparently between domains (HPC, Cloud) o On-demand provisioning o Legacy and/or emerging elastic workflows o Provide a wider range of services to NCI users o GPU clusters o Utilise the most appropriate and energy efficient hardware to achieve research outcomes Combining computation and data
  21. 21. 10 GigE /g/data 56Gb FDR IB Fabric /g/data1 ~7.4PB /g/data2 ~6.5PB /short 7.6PB /home, /system, /images, /apps Cache 1.0PB, Tape 12.3PB Massdata /g/data Raijin FS VMware OpenStack Tenjin NCI data movers ToHuxleyDC Raijin 56Gb FDR IB Fabric Internet Raijin Compute Raijin Login + Data movers /g/data3 ~7.3PB OpenStack NeCTAR Ceph NeCTAR 0,.5 PB Tenjin 0.5PB NCI Systems Connectivity
  22. 22. o Elements which differentiate NCI HPC and Cloud systems o Workflows o Communications architecture o InfiniBand and Ethernet o InfiniBand o FDR 56Gbs and EDR 100Gbs o Lossless - full fat tree o Deterministic network latency and throughput o Hardware offload for communication through RDMA o Kernel and TCP/IP stack bypass o Ethernet o 10Gbps, 40Gbps, 56Gbps and 100Gbps o 10G is typical for Cloud presentation o Can be lossless or a traditional switched network o RDMA o Remote Direct Memory Access o Offloads communication from operating system network stack o Heavily used in HPC applications through various MPI libraries Comparing Cloud System performance
  23. 23. Why are packet loss and latency important Image: ESNet
  24. 24. o What are we measuring ? o Can traditional HPC level MPI applications run effectively within a container environment ? o How do latency and throughput compare to our baseline HPC performance ? o Comparison of MPI RDMA performance in various environments o Native InfiniBand (Full Fat Tree) o Ethernet and RoCE (Full Fat Tree and Switched) o RDMA in a container o How does it compare to Bare Metal performance Examining container performance
  25. 25. Cluster Architecture Interconnect Loc Raijin Xeon(R) CPU E5-2670 @ 2.60GHz (Sandy Bridge) Mellanox FDR Infiniband - FFT NCI Tenjin Intel Xeon E312xx @ 2.60 GHz (Sandy Bridge) Mellanox FDR Infiniband, flashed to 56G Ethernet- FFT NCI Tenjin
 (Container) Intel Xeon E312xx @ 2.60 GHz (Sandy Bridge) Mellanox FDR Infiniband, flashed to 56G Ethernet- FFT NCI InfiniCloud Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Mellanox FDR Infiniband NCI 10G-Cloud AMD Opteron 63xx 10G Ethernet o OpenMPI 1.10 o All applications compiled with GCC used with -O3. The Intel Compilers were not used, to achieve a fair comparison. o All clouds were based on OpenStack. (Icehouse, Juno, Kilo) o Preliminary results- 10 runs, discarded max and min results and took average o Comprehensive results will be presented in a white paper. Preliminary Results (Platform)
  26. 26. Point to Point Latency
  27. 27. 0 1000 2000 3000 4000 5000 6000 7000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M Bandwidth!(MB/Sec) Message!Size!in!bytes OSU!Point!to!Point!Bandwidth!(MB/ Sec)!- Higher!is!Better #!BW-AWS-WEB #!10GbE-Cloud #!Tenjin-TCP #!Tenjin-Yalla #!Tenjin-RoCE #!Tenjin-Container #!InfiniCloud-VM #!InfiniCloud-HY Raijin Point to Point Bandwidth
  28. 28. Courtesy: Dr. Ching-Yeh (Leaf) Lin at NCI Trinity is a bioinformatics de novo sequence-assembly package consists of three programs: Inchworm (openmp, gcc), Chrysalis (openmp, gcc) and Butterfly (java). The calculation was carried out using the procedure published by BJ Haas et al, Nature Protocols 8, 1494–1512 (2013) 28 Bioinformatics Workload Speedups compared to 10G-XXX-Cloud 16 CPU-One Compute Node (higher is better) 0 0.5 1 1.5 2 Inchworm Chrysalis Butterfly Raijin Tenjin 10G-XXX-Cloud Bioinformatics workload – Single compute node
  29. 29. Speed-up of NPB Class 'C' with 32 and 64 Processes Normalized w.r.t. 32 Processes on 10G Ethernet Cloud (Higher is better) 0 2.5 5 7.5 10 CG EP FT IS LU MG 10GbE-Cloud-32P Tenjin-32P Tenjin Container-32P Raijin-32P 10GbE-Cloud-64P Tenjin-64P Tenjin Container-64P Raijin-64P NAS Parallel Benchmarks
  30. 30. - ApoA1, measured s time-step - 16 CPUS per Node - Lack of NUMA - TCP btl on cloud worked better than MXM NAMD Speed-up Speedup 0 12.5 25 37.5 50 Number of CPUs 1 2 4 8 16 32 64 128 Tenjin Tenjin-Containers Raijin Molecular Dynamics Code - NAMD
  31. 31. ComputeTime(s)(loweris better) 1.00 10.00 100.00 1000.00 10000.00 Number of CPUs 1 2 4 8 16 32 64 128 RDO TCP RDO TCP MXM RDO OIB RDO OIB MXM RJ TCP RJ TCP MXM RJ OIB RJ OIB MXM Courtesy: Dr. Benjamin Menadue Computational Physics: Custom-written, hybrid Monte Carlo code for generate gauge fields for Lattice QCD. For each iteration, calculating the Hamiltonian involves inverting a large, complex matrix using CGNE. Written in Fortran, using pure MPI (no threading). 31 Scaling still an issue – NUMA
  32. 32. NCI’s commitment to HPC in the Cloud o NCI is engaged with many partners providing Cloud based HPC and HTC solutions to researchers. These are usually released as Open Source. o Slurm-Cluster o Enables a researcher to quickly and easily build a cluster in the cloud backed by the Slurm scheduler. It is targeted to Tenjin and NeCTAR clouds, but should work on any OpenStack deployment. o Intel Grant for Cluster in the Cloud o Worked with Amazon via LinkDigital o Raijin in a Box in preproduction and to be made available to the AWS market place. o How to build a supercomputer on AWS with spot instances.
  33. 33. NCI’s commitment to HPC in the Cloud o Applying NCI’s depth of expertise in HPC application tuning to deliver high performance, secure computing environments in the Cloud for Australian Researchers. o Bringing “Cloud to HPC” o Containers o Docker o “Bring your own workflow” model
  34. 34. o We can support seamless high performance research workloads with large data access requirements across multiple platforms o Parallel jobs can run on the Cloud, but is it HPC? o Not at the moment. o Cloud is suited to high throughput computing (HTC), ease of provisioning and specific workloads o Traditional HPC provides the best performance for larger parallel applications with MPI requirements. o A common underlying hardware architecture shared between our HPC and Cloud platforms provides application portability and flexibility in provisioning a system in either role. o QPI and NUMA can have a large impact on performance o Single Node performance is on par with bare metal (if the application is not memory bound) o Locality Aware Scheduling (NUMA and Network awareness) o Our benchmarks were limited by the QPI performance of SandyBridge. o NCI plans to deploy bare-metal provisioning using Ironic 34 Conclusion
  35. 35. @NCInews Thank You