Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing - Linaro Arm HPC Workshop


Published on

Event: Arm Architecture HPC Workshop by Linaro and HiSilicon
Location: Santa Clara, CA
Speaker: Andrew J Younge

Talk Title: Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing

Talk Desc: The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community. Specifically, there is a need to expand the supercomputing ecosystem by investing and developing emerging, yet-to-be-proven technologies and address both hardware and software challenges together, as well as to prove-out the viability of such novel platforms for production HPC workloads.

The first deployment of the Vanguard program will be Astra, a prototype Petascale Arm supercomputer to be sited at Sandia National Laboratories during 2018. This talk will focus on the arthictecural details of Astra and the significant investments being made towards the maturing the Arm software ecosystem. Furthermore, we will share initial performance results based on our pre-general availability testbed system and outline several planned research activities for the machine.

Bio: Andrew Younge is a R&D Computer Scientist at Sandia National Laboratories with the Scalable System Software group. His research interests include Cloud Computing, Virtualization, Distributed Systems, and energy efficient computing. Andrew has a Ph.D in Computer Science from Indiana University, where he was the Persistent Systems fellow and a member of the FutureGrid project, an NSF-funded experimental cyberinfrastructure test-bed. Over the years, Andrew has held visiting positions at the MITRE Corporation, the University of Southern California / Information Sciences Institute, and the University of Maryland, College Park. He received his Bachelors and Masters of Science from the Computer Science Department at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively.

Published in: Technology
  • Be the first to comment

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing - Linaro Arm HPC Workshop

  1. 1. P R E S E N T E D B Y Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE- NA0003525. Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin Pedretti, Si Hammond Sandia National Laboratories env
  2. 2. Outline  Overview of Vanguard  Prototype HPC Architectures  Astra – Petascale ARM platform  ATSE – Advanced Tri-lab Software Environment  R&D Opportunities  Conclusion
  3. 3. Vanguard Overview
  4. 4. Vanguard Program: Advanced Architecture Prototype Systems4 • Prove viability of advanced technologies for NNSA integrated codes, at scale • Expand the HPC-ecosystem by developing emerging unproven technologies • Is it viable for future ATS/CTS platforms? • Increase technology AND integrator choices • Buy down risk and increase technology and vendor choices for future platforms • Ability to accept higher risk allows for more/faster technology advancement • Lowers/eliminates mission risk and significantly reduces investment • Jointly address hardware and software challenges • First prototype platform targeting ARM
  5. 5. Where Vanguard Fits5 V a n g u a r dT e s t B e d s A T S /C T S P la tfo r m s H ig h e r R is k , G r e a te r A r c h ite c tu r a l C h o ic e s G r e a te r S ta b ility , L a r g e r S c a le Test Beds • Small testbeds (~10-100 nodes) • Breadth of architectures • Brave users Vanguard • Larger-scale experimental systems • Focused eforts to mature new technologies • Broader user-base • Demonstrate viability for production use • NNSA Tri-lab resource ATS/CTS Platforms • Leadership-class systems (Petascale, Exascale, ...) • Advanced technologies, sometimes frst-of-kind • Broad user-base • Production use
  6. 6. Vanguard Phase 1: Astra
  7. 7. Hammer APM/HPE Xgene-1 Sullivan Cavium/Penguin ThunderX1 = 2018 = Retired = 2015 = 2017 TODAY Mayer Cavium HPE/Comanche ThunderX2 Future ASC Platforms Sept 2011 Astra Sandia’s NNSA/ASC ARM Platforms7 Petascale ARM Platform Delivery Aug/Sep 2018 HPE Apollo 70 Cavium ThunderX2 Mellanox ConnectX- 5 Switch-IB2 2592 nodes Cavium ThunderX 1 32 nodes Pre-GA Cavium ThunderX2 47 nodes Applied Micro X-Gene-1 47 nodes Vanguard
  8. 8. 8 Astra “Per aspera ad astra” Demonstrate viability of ARM for U.S. DOE Supercomputnn 2.3 PFLOPs peak 885 TB/s memory bandwidth peak 332 TB memory 1.2 MW per aspera ad astra through difficulties to the stars
  9. 9. Vanguard-Astra Compute Node Building Block9 Dual socket Cavium Thunder-X2  CN99xx  28 cores @ 2.0 GHz 8 DDR4 controllers per socket One 8 GB DDR4-2666 dual-rank DIMM per controller Mellanox EDR InfiniBand ConnectX-5 VPI OCP Tri-Lab Operating System Stack based on RedHat 7.5+ HPE Apollo 70 Cavium TX2 Node
  10. 10. Vanguard-Astra Compute Node10 Cavium Thunder- X2 ARM v8.1 28 cores @ 2.0 GHz Cavium Thunder- X2 ARM v8.1 28 cores @ 2.0 GHz 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 DDR4 channels/socket, 1 DIMM/channel Each socket has its own PCIe x8 link to NIC Mellanox ConnectX- 5 OCP Network Interface PCIe Gen3 PCIe Gen3 Management Ethernet 1 Gbps x8x8 1 EDR link, 100 Gbps1 Gbps
  11. 11. Vanguard-Astra System Packaging11 HPE Apollo 70 Chassis: 4 nodes Astra 18 chassis/rack 72 nodes/rack 3 IB switches/rack (one 36-port switch per 6 chassis) 36 compute racks (9 scalable units, each 4 racks) 2592 compute nodes (5184 TX2 processors) 3 IB spine switches (each 540-port) HPE Apollo 70 Rack
  12. 12. Vanguard-Astra Infrastructure12 Login & Service Nodes 4 login/compilation nodes 3 Lustre routers to connect to external Sandia flesystem(s) 2 general service nodes Interconnect EDR InfniBand in fat tree topology 2:1 oversubscribed for compute nodes 1:1 full bandwidth for in-platform Lustre storage System Management Dual HA management nodes running HPE Performance Software – Cluster Manager (HPCM) Ethernet management network, connects to all nodes One boot server per scalable unit (288 nodes) In-platform Storage All-fash Lustre storage system 403 TB usable capacity 244 GB/s throughput
  13. 13. Network Topology13 540-Port Switch #2 540-Port Switch #3540-Port Switch #1 Switch 2.1 Switch 2.2 Switch 2.3 Switch 2.4 Switch 2.29 Switch 2.30 Switch 3.1 Switch 3.2 Switch 3.18... ... Switch 2.61 Switch 2.62 Switch 2.63 Switch 2.64 Switch 2.89 Switch 2.90 Switch 3.37 Switch 3.38 Switch 3.54... ... Switch 1.1 Switch 1.2 Switch 1.3 Switch 1.4 Switch 1.5 Switch 1.6 Switch 1.7 Switch 1.8 ... Switch 1.105 Switch 1.06 Switch 1.107 Switch 1.108 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes Switch 2.31 Switch 2.32 Switch 2.33 Switch 2.34 Switch 2.59 Switch 2.60 Switch 3.19 Switch 3.20 Switch 3.36... ... 108 L1 switches * 24 nodes/switch = 2592 compute nodes Mellanox Switch-IB2 EDR, Radix 36 switches, 3 level fat tree, 2:1 taper at L1 Each L1 switch has 4 links to each 540-port switch
  14. 14. Vanguard-Astra Advanced Power & Cooling14 18.5C WB 20.0C 20.0C 1.5C approach wall peak nominal (linpack) idle racks wall peak nominal (linpack) idle Node racks 39888 35993 33805 6761 36 1436.0 1295.8 1217.0 243.4 MCS300 10500 7400 7400 170 12 126.0 88.8 88.8 2.0 Network 12624 10023 9021 9021 3 37.9 30.1 27.1 27.1 Storane 11520 10000 10000 1000 2 23.0 20.0 20.0 2.0 utlity 8640 5625 4500 450 1 8.6 5.6 4.5 0.5 1631.5 1440.3 1357.3 274.9 Projected power of the system by component perconsttuent rack type (W) total (kW) Extreme Efciency:  Total 1.2 MW in the 36 compute racks are cooled by only 12 fan coils  These coils are cooled without compressors year round. No evaporatve water at all almost 6000 hours a year  99% of the compute racks heat never leaves the cabinet, yet the system doesn’t require the internal plumbinn of liquid disconnects and cold plates runninn across all CPUs and DIMMs  Builds on work by NREL and Sandia: htps://www.nrel.nov/esif/partnerships-j c.html
  15. 15. ATSE – Advanced Tri-lab Software Environment
  16. 16. Advanced Tri-lab Software Environment Goals16 Build an open, modular, extensible, community-inspired, and vendor-adaptable ecosystem Prototype new technologies that may improve the DOE ASC computing environment (e.g., ML frameworks, containers, VMs, etc) Leverage existing efforts  Tri-lab OS (TOSS)  OpenHPC & other programming environments  Exascale Computing Project (ECP) software technologies Dec’17 ATSE DesignDoc Aug’17 Tri-labArmsoftware teamformed Jul’18 Initial ReleaseTarget Sep’18 FirstUseon Vanguard-Astra ATSE stack
  17. 17. Tri-Lab Software Efort for ARM17 Accelerate ARM ecosystem for ASC computing  Prove viability for ASC integrated codes running at scale  Harden compilers, math libraries, tools, communication libraries  Heavily templated C++, Fortran 2003/2008, Gigabyte+ binaries, long compiles  Optimize performance, verify expected results Build integrated software stack  Programming environment (compilers, math libs, tools, MPI, OMP, SHMEM, I/O, ...)  Low-level OS (optimized Linux, network, filesystems, containers/VMs, ...)  Job scheduling and management (WLM, app launcher, user tools, ...)  System management (boot, system monitoring, image management, ...) Improve 0 to 60 time... ARM system arrival to useful work done env
  18. 18. ARM Tri-lab Software Environment (ATSE) Vanguard Hardware Base OS Layer Closed Source Integrator ProvidedLimited Distribution ATSE Activity Vendor OS TOSS Open OS e.g. OpenSUSE Cluster Middleware e.g. Lustre, SLURM ATSE Programming Environment “Product” for Vanguard Platform-optimized builds, common-look-and-feel across platforms Virtual Machines ATSE Packaging User-facing Programming Env Native Installs Containers NNSA/ASC Application Portfolio Open Source
  19. 19. Integrate Components from Many Sources TOSS RHEL EPEL Vendor Sofware ATSE Packaner OpenHPC Open Build Server ATSE Packanes Vendor Sofware Koji Build Server ATSEDiagram(fromSNLFeb12TOSSmeeting) VanguardHardware Base OS Layer Closed Source Intenrator ProvidedLimited Distribution ATSE Activity VendorOS TOSS OpenOS e.g.OpenSUSE Cluster Middleware e.n. Lustre, SLURM ATSEProgrammingEnvironment“Product”forVanguard Platform-optimized builds, common-look-and-feel across platforms Virtual MachinesATSE Packaninn User-facinn Pronramminn Env Native InstallsContainers NNSA/ASCApplicationPortfolio Open Source ATSE Activity Closed Source Integrator Provided Limited Distribution Open Source Key:
  20. 20. Draft ATSE Timeline for 2018 March Continue software stack explorations and gap analysis on testbeds Setup OpenBuild server and replicate OHPC package builds for aarch64 April – May Develop ATSE Packager framework, ability to pull packages from TOSS, RHEL, OpenHPC OBS, vendor, and other sources Identify initial component list July Initial ATSE release 2018.0 on Mayer Lab-distribution version: TOSS BaseOS + (ATSE-GCC | ATSE-ARM | ATSE-*)  Open-distribution version: SUSE and/or CentOS BaseOS + ATSE-GCC Q3 2018  Linux kernel optimization and HPC patches  Basic VM & container support Q4 2018  ATSE 2018.1 release  Initial upstream to OpenHPC push
  21. 21. Astra Status and Research
  22. 22. Acceptance Plan – Maturing the Stack22 env
  23. 23. Cavium Arm64 Providing Best-of-Class Memory Bandwidth 23 STREAM TRIAD TX2 DDR4-2400 SkyLake 8160 Trinity Haswell Trinity KNL DDR
  24. 24. Network Bandwidth on ThunderX2 + Mellanox MLX5 EDR with Socket Direct 24 Node 1 MLX5 EDR MLX5_0 MLX5_3 Socket 1 Socket 2 Node 2 MLX5 EDR MLX5_3MLX5_0 Socket 2Socket 1 Pair 1 Pair 1 Pair 2 Pair 2 1 Network Link Socket Direct – Each socket has dedicated path to the NIC OSU MPI Multi- Network Bandwidth Arm64 + EDR providing > 12 GB/s between nodes > 75M messages/se c
  25. 25. Mini-App Performance on Cavium ThunderX225 ThunderX2 providing high memory bandwidth 6 channels (Skylake) vs. 8 in ThunderX2 See this in MiniFE SpMV and STREAM Triad Slower compute reflects less optimization in software stack Examples – Non-SpMV kernels in MiniFE and LULESH GCC and ARM versus Intel compiler MiniFE Solve GF/s MiniFE SpMV GF/s STREAM Triad LULESH Solve FOM 0 0.5 1 1.5 2 2.5 Speedup over Haswell E5-2680v3 ThunderX2 Skylake 8160 Haswell E5-2680 SpeedupoverHaswell
  26. 26. R&D Areas26 Leverage containers and virtual machines  Support for machine learning frameworks  ARMv8.1 includes new virtualization extensions, SR-IOV  Working with Singularity on full container solution Evaluating parallel filesystems + I/O systems @ scale  GlusterFS, Ceph, BeeGFS, Sandia Data Warehouse, … Resilience studies over Astra lifetime Improved MPI thread support, matching acceleration OS optimizations for HPC @ scale  Exploring HPC-tuned Linux kernels to non-Linux lightweight kernels and multi-kernels  Arm-specific optimizations
  27. 27. Conclusion27 Vanguard allows the DOE to take necessary risks to ensure a healthy HPC ecosystem for future production mission platforms Increase technology choices Prove ability to run multi-physics production applications at scale Tri-lab software stack effort to mature ARM for ASC computing Harden compilers, math libs, and tools Optimize performance, verify expected results Increase modularity and openness of software stack Support traditional HPC and emerging AI + ML workloads Sandia now a member of Linaro HPC SIG!
  28. 28. Questions?
  29. 29. Virt - From x86 to ARM64 ▪ What opportunites and challennes do we face when movinn from an x86 world to an ARM world? ▪ Virtual Machines ▪ Near-natve HPC performance with VMs possible in x86 – Type1 & Type2 hypervisors – Hobbes / Palacios VM ▪ Avoid lenacy issues with x86? ▪ Anythinn new we can do with ARM? ▪ Containers ▪ How do I build containers on my x86 laptop that run on Astra? ▪ Focus on ABI compatbility ▪ Leverane industry/enterprise without losinn HPC focus per aspera ad astra through difficulties to the stars