Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Sierra Supercomputer: Science and Technology on a Mission

1,120 views

Published on

In this deck from the Stanford HPC Conference, Adam Bertsch from LLNL presents: The Sierra Supercomputer: Science and Technology on a Mission.

"LLNL just celebrated its 65th anniversary. Since 1952, the laboratory has been at the forefront of high performance computing. Initially, HPC was used to accelerate the design and testing of the nation's nuclear stockpile. Since the last U.S. nuclear test in 1992, HPC has been used to validate the safety, security, and reliability of stockpile without nuclear testing.

Our next flagship HPC system at LLNL will be called Sierra. A collaboration between multiple government and industry partners, Sierra and its sister system Summit at ORNL, will pave the way towards Exascale computing architectures and predictive capability."

Watch the video: https://wp.me/p3RLHQ-i4K

Learn more: https://computation.llnl.gov/computers/sierra
and
http://hpcadvisorycouncil.com

Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

The Sierra Supercomputer: Science and Technology on a Mission

  1. 1. LLNL-PRES-746388 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC The Sierra Supercomputer: Science and Technology on a Mission Adam Bertsch Sierra Integration Project Lead Lawrence Livermore National Laboratory February 20th, 2018
  2. 2. LLNL-PRES-746388 2 § The Sierra Supercomputer § Collaboration — CORAL and OpenPower § Science and Technology on a Mission Overview
  3. 3. LLNL-PRES-746388 3 § On a spring bicycle ride in Livermore five years ago. Where did Sierra start for me?
  4. 4. LLNL-PRES-746388 4 AdvancedTechnology Systems(ATS) Fiscal Year ‘13 ‘14 ‘15 ‘16 ‘17 ‘18 Use Retire ‘20‘19 ‘21 Commodity Technology Systems(CTS) Procure& Deploy Sequoia (LLNL) ATS 1 – Trinity (LANL/SNL) Tri-lab Linux Capacity Cluster II (TLCC II) CTS 1 CTS 2 ‘22 System Delivery ATS 3 – Crossroads (LANL/SNL) ATS 4 – El Capitan (LLNL) ‘23 ATS 5 – (LANL/SNL) ATS 2 – Sierra (LLNL) Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL Sierra will be the next ASC ATS platform
  5. 5. LLNL-PRES-746388 5 The Sierra system that will replace Sequoia features a GPU-accelerated architecture Mellanox Interconnect Single Plane EDR InfiniBand 2 to 1 Tapered Fat Tree IBM POWER9 • Gen2 NVLink NVIDIA Volta • 7 TFlop/s • HBM2 • Gen2 NVLink Components Compute Node 2 IBM POWER9 CPUs 4 NVIDIA Volta GPUs NVMe-compatible PCIe 1.6 TB SSD 256 GiB DDR4 16 GiB Globally addressable HBM2 associated with each GPU Coherent Shared Memory Compute Rack Standard 19” Warm water cooling Compute System 4320 nodes 1.29 PB Memory 240 Compute Racks 125 PFLOPS ~12 MW Spectrum Scale File System 154 PB usable storage 1.54 TB/s R/W bandwidth
  6. 6. LLNL-PRES-746388 6 Sierra Represents a Major Shift For Our Mission- Critical Applications 2004 - present • IBM BlueGene, along with copious amounts of linux-based cluster technology dominate the HPC landscape at LLNL. • Apps scale to unprecedented 1.6M MPI ranks, but are still largely MPI-everywhere on ”lean” cores 2014 BG/L BG/P BG/Q • Heterogeneous GPGPU-based computing begins to look attractive with NVLINK and Unified Memory (P9/Volta) • Apps must undertake a 180 turn from BlueGene toward heterogeneity and large-memory nodes 2015 – 20?? • LLNL believes heterogeneous computing is here to stay, and Sierra gives us a solid leg-up in preparing for exascale and beyond
  7. 7. LLNL-PRES-746388 7 Cost-neutral architectural tradeoffs This decision, counter to prevailing wisdom for system design, benefits Sierra’s planned UQ workload: 5% more UQ simulations at performance loss of < 1% § Full bandwidth from dual ported Mellanox EDR HCAs to TOR switches § Half bandwidth from TOR switches to director switches § An economic trade-off that provides approximately 5% more nodes § DRAM cost double expectation § Evaluated two options to close the cost gap § Buy half as much memory § Taper the network § Selected both options!
  8. 8. LLNL-PRES-746388 8 Sierra system architecture details Sierra uSierra rzSierra Nodes 4,320 684 54 POWER9 processors per node 2 2 2 GV100 (Volta) GPUs per node 4 4 4 Node Peak (TFLOP/s) 29.1 29.1 29.1 System Peak (PFLOP/s) 125 19.9 1.57 Node Memory (GiB) 320(256+64) 320(256+64) 320(256+64) System Memory (PiB) 1.29 0.209 0.017 Interconnect 2x IB EDR 2x IB EDR 2x IB EDR Off-Node Aggregate b/w (GB/s) 45.5 45.5 45.5 Compute racks 240 38 3 Network and Infrastructure racks 13 4 0.5 Storage Racks 24 4 0.5 Total racks 277 46 4 Peak Power (MW) ~12 ~1.8 ~0.14
  9. 9. LLNL-PRES-746388 9 CORAL Targets compared to Sierra § Target speedup over current systems of 4x on Scalable benchmarks … § … and 6x on Throughput benchmarks § Aggregate memory of 4 PB and ≥ 1 GB per MPI task (2 GB preferred) § Max power of system and peripherals ≤ 20MW § Mean Time Between Application Failure that requires human intervention ≥ 6 days § Architectural Diversity § Delivery in 2017 with acceptance in 2018 § Peak Performance ≥ 100 PF
  10. 10. LLNL-PRES-746388 10 Collaboration
  11. 11. LLNL-PRES-746388 11 Sierra is part of CORAL, the Collaboration of Oak Ridge, Argonne and Livermore CORAL is the next major phase in the U.S. Department of Energy’s scientific computing roadmap and path to exascale computing Modeled on successful LLNL/ANL/IBM Blue Gene partnership (Sequoia/Mira) Long-term contractual partnership with 2 vendors 2 awardees for 3 platform acquisition contracts 2 nonrecurring eng. contracts NRE contract ANL Aurora contract NRE contract ORNL Summit contract LLNL Sierra contract RFP BG/Q Sequoia BG/L BG/P Dawn LLNL’s IBM Blue Gene Systems
  12. 12. LLNL-PRES-746388 12 OpenPOWER collaboration enabled technology behind Sierra
  13. 13. LLNL-PRES-XXXXXX 13 LLNL-PRES-683853 21& !  Motherboard'design' !  Water'cooled'compute'nodes'' !  GPU'resilience'' !  Switch'based'collectives,' FlashDirect' !  Open'source'compiler' infrastructure' CORAL&NRE&will&provide&significant&benefit& to&the&final&systems& !  System'diagnostics' !  System'scheduling'' !  Burst'buffer'' !  GPFS'performance'and' scalability' !  Cluster'management' !  Open'source'tools' !  Center'of'Excellence' Eight''joint'lab/contractor'working'groups'are'actively' working'on'all'the'above'and'more'''
  14. 14. LLNL-PRES-746388 14 CORAL NRE drove Spectrum Scale advances that will vastly improve Sierra file system Original Plan of Record Modified (Cost Neutral) § 120PB delivered capacity § Conservative drive capacity estimates § 1TB/s write, 1.2TB/s read § Concern about metadata targets NRE + Hardware Advancements Close partnerships lead to ecosystem improvements § 154PB delivered capacity § Substantially increased drive capacities § 1.54 TB/s write/read § Enhanced Spectrum Scale metadata perf. (many optimizations already released)
  15. 15. LLNL-PRES-746388 15 Outstanding benchmark analysis by IBM and NVIDIA demonstrates the system’s usability Projections included code changes that showed tractable annotation-based approach (i.e., OpenMP) will be competitive Figure 5: CORAL benchmark projections show GPU-accelerated system is expected to deliver substan performance at the system level compared to CPU-only configuration. The demonstration of compelling, scalable performance at the system level across a 13 CORAL APPLICATION PERFORMANCE PROJECTIONS 0x 2x 4x 6x 8x 10x 12x 14x QBOX LSMS HACC NEKbone RelativePerformance Scalable Science Benchmarks CPU-only CPU + GPU 0x 2x 4x 6x 8x 10x 12x 14x CAM-SE UMT2013 AMG2013 MCB RelativePerformance Throughput Benchmarks CPU-only CPU + GPU
  16. 16. LLNL-PRES-746388 16 3-4 Years to Prepare Applications for Sierra 2014 – CORAL contracts at LLNL and ORNL go to IBM/NVIDIA. 3-4 years of “runway” Today – System Delivery Underway. Initial open science use about to begin. Transition to mission use in ~10 months. Goal: Take off flying when machine “hits the floor”
  17. 17. LLNL-PRES-746388 17 Application enablement through the Center of Excellence – the original (and enduring) concept Vendor Application / Platform Expertise Staff (code developers) Applications and Key Libraries Hands-on collaborations On-site vendor expertise Customized training Early delivery platform access Application bottleneck analysis Both portable and vendor-specific solutions Technical areas: algorithms, compilers, programming models, resilience, tools, … Codes tuned for next-gen platforms Rapid feedback to vendors on early issues Publications, early science Steering committee The LLNL/Sierra CoE is jointly funded by the ASC Program and the LLNL Institution Improved Software Tools and Environment Performance Portability
  18. 18. LLNL-PRES-746388 18 § Decouple loop traversal and iterate (body) § An iterate is a “task” (aggregate, (re)order, …) § IndexSet and execution policy abstractions simplify exploration of implementation/tuning options without disrupting source code RAJA is our performance-portability solution that uses standard C++ idioms to target multiple back-end programming models double* x ; double* y ; double a, tsum = 0, tmin = MYMAX; for ( int i = begin; i < end; ++i ) { y[i] += a * x[i] ; tsum += y[i] ; if ( y[i] < tmin ) tmin = y[i]; } C-style for-loop Execution Policy (how loop runs: PM backend, etc.) Index (index sets, segments to partition, order, …. iterations) Pattern (forall, reduction, scan, etc.) RAJA Concepts RAJA-style loop double* x ; double* y ; double a ; RAJA::SumReduction<reduce_policy, double> tsum(0); RAJA::MinReduction<reduce_policy, double> tmin(MYMAX); RAJA::forall< exec_policy > ( IndexSet , [=] (int i) { y[i] += a * x[i] ; tsum += y[i]; tmin.min( y[i] ); } ); double* x ; double* y ; double a ; RAJA::SumReduction<reduce_policy, double> tsum(0); RAJA::MinReduction<reduce_policy, double> tmin(MYMAX); RAJA::forall< exec_policy > ( index_set , [=] (int i) { y[i] += a * x[i] ; tsum += y[i]; tmin.min( y[i] ); } ); RAJA Transformation RAJA allows us to write-once, target multiple back-ends
  19. 19. LLNL-PRES-746388 19 Early Access to CORAL Software Technology Based on IBM’s current technical programming offerings with enhancements for new function and scalability. Includes: • Initial messaging stack implementation • Initial Compute node kernel • Compilers: o Open Source LLVM o PGI compiler o XL compiler Early Access systems provide critical pre-Sierra generation resources Early Access Compute System • 18 Compute Nodes • EDR IB switch fabric • Ethernet mgmt • Air-cooled Early Access Compute Node • “Minsky” HPC Server • 2xPOWER8+ processors • 4xNVIDIA GP100; NVLINK • 256GB SMP (32x8GB DDR4) Initial Early Access GPFS • GPFS Management servers • 2 POWER8 Storage controllers • 1 GL-2; 2x58 6TB SAS drives or • 1 GL-4; 4x58 6TB SAS drives • EDR InfiniBand switch fabric • Enhanced to GL-6, 6x58 6TB SAS drives
  20. 20. LLNL-PRES-746388 20 Three LLNL Early Access systems support ASC and institutional preparations for Sierra Shark 597.6TF Compute Compute Storage Infrastructure ESS Ray: 896.4 TF Compute Compute StorageCompute Infrastructure Compute nodes on Ray have 1.6TB NVMe drives Compute Compute Storage Infrastructure ESS Manta 597.6 TF
  21. 21. LLNL-PRES-746388 21 Sierra COE focus over time and how it necessarily has shifted as technology and code readiness advances Time (~ 5 years) Relativelevelsofeffort • Training • Topical “deep dives” • Develop of portability abstractions (RAJA) • Early explorations on key kernels • Exploration with proxy apps • Identify algorithmic and data structures changes • Hack-a-thons • Use of available non-IBM GPU resources DeliveryofSierra • Transition to Sierra • Performance optimization • Deliver “early science” results • Transition to early delivery hardware • Focus on production apps • Performance prediction and modeling • Test portability
  22. 22. LLNL-PRES-746388 22 The COE has broadened to encompass most Sierra preparation activities § The Sierra COE began with vendor proposals as part of the Sierra procurement for proposed NRE (non-recurring engineering) activities — Inspired by ORNL Titan activities — IBM Research and NVIDIA Dev Tech’s are staffed with a cadre of top-notch application experts § Has become an organizing principle beyond vendor contributions — Ties to Livermore Computing (LC) CORAL working groups (compilers and tools) — ASQ division staff continue to provide key CS support to ASC code teams — CASC leading several projects (RAJA, ICOE Tools and Libraries) § The LLNL Advance Architecture and Portability Specialists (AAPS) team has become an integral and enduring part of our COE strategy — Providing targeted expertise to both ASC and ICOE teams Sierra COE Sierra NRE subcontract w/ IBM Institutional COE LLNL Staff (Code Teams + AAPS)
  23. 23. LLNL-PRES-746388 23 Preliminary ASC application results on EA system (single node results) Application/ package/lib Status Notes Observed GPU Speedups* Hydro Most of code ported to RAJA 7-10x (package dependent) Equation of State Ported to RAJA. Still porting setup/initialization and some less- used interpolation routines 11x CG solver Internal (e.g. non-hypre) solver 10x Hypre Setup not ported to GPU, still a potential bottleneck 9x Unstructured Deterministic Transport Basing work on UMT benchmark results. Performance limited to NVLINK bandwidth (streaming algorithm) With UMT: 2-2.5x streaming 4-8x in memory Structured Deterministic Transport Excellent results with Kripke mini-app via RAJA. Porting main application and expect similar results Kripke (mini-app): 14x Monte Carlo Transport Using “big kernel” approach, and exploring via Quicksilver mini-app. ~1-2x * 4 GPUs vs 2 P8s Little or no tuning has been done for the Power CPU yet, and compilers are improving. Most codes currently see a 1.1 – 1.4x speedup over comparable Xeon CPUs (e.g. Haswell) Optimistic Optimistic, with caveats Unknown, but promising May not be GPU-friendly GPUs bad
  24. 24. LLNL-PRES-746388 24 Science and Technology on a Mission Intelligence Biosecurity Nonproliferation Defense Science Energy Weapons Counterterrorism
  25. 25. LLNL-PRES-746388 25 § Annual certification of the U.S. nuclear stockpile without nuclear testing Stockpile Stewardship and the ASC Program
  26. 26. LLNL-PRES-XXXXXX 26 LLNL-PRES-683853 5& !  ASC&delivers&the&leading&edge,&highGend&simula>on&capabili>es& needed&to&meet&U.S.&nuclear&weapons&assessment&and&cer>fica>on& requirements,&including:&weapon&codes,&weapon&science,&compu>ng& plaOorms,&and&suppor>ng&infrastructure.& !  ASC&modeling&and&simula>on&provides:& —  a&computa>onal&surrogate&for&nuclear&tes>ng,&which&is&central&to&U.S.& na>onal&security.& —  the&ability&to&model&the&extraordinary&complexity&of&nuclear&weapons& systems,&which&is&essen>al&to&establish&confidence&in&the&performance&of& our&aging&stockpile.&& —  the&necessary&tools&for&nuclear&weapons&scien>sts&and&engineers&to&gain& a&comprehensive&understanding&of&the&en>re&weapons&lifecycle&from& design&to&safe&processes&for&dismantlement.&& !  Through&close&coordina>on&with&other&government&agencies,&ASC& tools&also&play&an&important&role&in&suppor>ng&nonprolifera>on& efforts,&emergency&response,&and&nuclear&forensics.& Advanced'Simula:on'and'Compu:ng'(ASC)' Program'is'LLNL’s'main'HPC'funding'source'
  27. 27. LLNL-PRES-746388 27 LLNL-PRES-683853 26& Extreme'scale'compu:ng'is'needed'across'the'spectrum' of'NNSA’s'na:onal'security'missions! Modelcomplexity Transition Metals Noble Metals Actinide Metals (10µ)3µ3(100nm)3 ab initio Systems 2D Weapons’ Science 3D Simulation scale Nuclear Stockpile •  Safety •  Surety •  Reliability •  Robustness Non- Proliferation and Nuclear Counter Terrorism
  28. 28. LLNL-PRES-746388 28 Advanced technology insertion will establish new directions for high-performance computing CORAL technology Exascale technology Memory- intensive systems Machine learning and neuromorphic technologies Quantum technologies Data-analytics systems e.g. Catalyst Cognitive simulation systems 38 Photonics Spectra June 2014 www.photonics.com Quantum Computers NV− center platforms have already proved effective in many core parts of a quan- tum computer, from single-qubit gates to entanglement of two remote NV− centers. The system can operate at room tem- perature, which is an advantage over ion traps and superconductor approaches. The NV− center offers an electron spin and a nuclear spin, the latter of which works as a memory to store quantum informa- tion. The electron spin can be coupled to a photon to connect one NV− module to another, potentially enabling large-scale quantum computers. optic interconnects, which can simultane- ously be used as off-chip memory/storage buffers. The role of the buffer is to delay single photons for short periods to facili- tate circuit synchronization. The NTT group has also proposed (doi: 10.1038/ ncomms3725) a photonic quantum pro- cessor based on an on-chip single-photon buffer using a coupled resonator optical waveguide (CROW). - fer consists of 400 high-Q (quantum nanocavities. The model showed buffer- ing of a pulsed single photon for up to 150 ps with 50-ps tenability while preserving entanglement. At the circuit level, the group found maximum tolerable error per elementary quantum gate to be ap- proximately 0.73 percent. At the physical architecture level, they estimate that a large-scale computer with a similar error threshold is achievable in the near future, - puting. Even better, the team says that its electron-spin entanglement distribution scheme can also be applied to quantum- dot-based systems. Superconductor platforms Why are qubits so faulty? The quantum state of a qubit is a very fragile thing, easily disturbed by loss due to materials- related defects, noise from nearby control lines, or other qubits. “Intuitively, one can imagine the quantum state as a spinning top: If you don’t touch it, it’ll keep spinning. If you tap on it, the top starts to precess; and if you perturb it too much, it tumbles,” said Rami Barends, postdoctoral fellow of physics at the University of California, Santa Barbara. Fault-tolerant large-scale quantum computers require a platform that can scale up, a compatible architecture that corrects errors to below 1 percent, high = the threshold, said Barends, a member of professor John Martinis’ group at UCSB. He and doctoral candidate Julian Kelly Nature paper de- scribing a superconductor-based quantum architecture that paves the path toward those goals. Superconductor platforms enable construction of large quantum circuits compatible with inexpensive microfabrication. The approach used by Martinis’ group uses a Josephson junction design, which uses aluminum as the superconductor and 2-nm thin barriers of aluminum oxide to store quantum bits on a sapphire substrate. The key to making the surface code work is a 2-D array of Xmon qubits in which quantum logic and measure- ment correct the errors (doi:10.1038/ nature13171). The processor chip has nearest-neighbor coupling. The gates modify the state of one qubit conditional on the state of the other, thus entangling them. The photons that are sent to the qu- bits have a 6-GHz microwave frequency, which can be generated electrically. To eliminate thermal noise at room temperature, superconductor-based EricLucero/UCSB 614QuantumComputing.indd 38 5/27/14 12:45 PM Workflow steering Active learning – design optimization and UQ Hybrid accelerated simulation 2015 2021 2018 2025 Analytics and simulation convergence
  29. 29. LLNL-PRES-746388 29 Integrated machine learning and simulation could enable dynamic validation of complex models High-fidelity simulation Ensembles of simulation sets CORAL computing architectures power the dynamic validation loop High dimensional model parameters Hypothesis generation – use the ML model to predict parameters for experimental data Requests for new experiments and data Distribution of predicted properties
  30. 30. LLNL-PRES-746388 30 § The advantages that led us to select Sierra generalize —Power efficiency —Network advantages of “fat nodes” —Clear path for performance portability —Balancing capabilities/costs implies complex memory hierarchies § Broad collaboration is essential to successful deployment of increasingly complex HPC systems —Computing centers collaborate on requirements —Vendors collaborate on solutions Sierra and its EA systems are beginning an accelerator-based computing era at LLNL Heterogeneous computing on Sierra represents a viable path to exascale
  31. 31. LLNL-PRES-746388 31

×