How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500

Penguin Computing
Penguin ComputingPenguin Computing
How Open Technology Helped DOE Labs Place
Sixteen Supercomputers on the Top500
Sid Mair– OCP 2019
© 2019 Penguin Computing
About Penguin Computing
2
▪ Specialized high-performance computing (HPC), bare
metal HPC in the cloud, AI, and storage technologies
▪ Coupled with leading-edge design, implementation,
hosting, and managed services including sys-admin
and storage-as-a-service, and highly rated customer
support
▪ More than 2,500 customers in 50 countries across
nine major vertical markets
▪ Over 300 OCP racks delivered to date based on
Tundra™ Extreme Scale Design
▪ 20 years of artificial intelligence (AI), engineering, and computer
science for startups, Fortune 500, government, and academic
organizations
© 2019 Penguin Computing
OCP Clusters in Top500
3
Sixteen supercomputers designed and built by Penguin Computing
placed on the TOP500 List since 2016
▪ All OCP based & deployed in U.S. national labs as part of the U.S. Department of
Energy as an alternative to explosive test-based confidence
▪ Part of the DOE’s Advanced
Simulation and Computing (ASC)
program
▪ Provide simulation-based
confidence in the nuclear
stockpile, an alternative to
explosive test-based confidence
© 2019 Penguin Computing
CTS-1: A Perfect Opportunity for OCP Design
4
▪ Supports National Nuclear Security Administration (NNSA) to ensure nuclear
stockpile stewardship in compliance with the Comprehensive Nuclear-Test-
Ban Treaty (CTBT) between the U.S. and the former Soviet Union
▪ 30,000 Broadwell / Skylake dual processor nodes to date
▪ Commodity clusters brought down the cost of HPC systems from
approximately $100 million per teraFLOP in 1995 to less than $5,000 per
teraFLOP today (factor of 20,000) with greater computing power and energy
efficiency with each generation
▪ Exemplifies how OCP-based technology can give organizations both value and
performance
▪ Provides flexibility in CPU Architectures, accelerators, and interconnects
© 2019 Penguin Computing
The Key -- Uniquely Flexible, Dense Tundra Design
5
Baseline Tundra™ Extreme Scale
▪ First was 1OU, 3 node, CPU compute, housed in v1 rack
▪ Supports up to 102 nodes per rack with switching
▪ May include GPGPU accelerated servers
▪ High-speed, low latency interconnects
Workloads synchronized within microseconds between
nodes
▪ Air or liquid cooling (Direct-to-Chip or Rear Doors)
▪ Latest generation open bridge rack
▪ Accessible via cloud through Penguin Computing On-
Demand®️ (POD)
▪ Multiple storage options
© 2019 Penguin Computing 6
▪ Shelf Dimensions: 3U (H) x 19in (W) x 25.8in (D)
▪ Many voltage options available (176-305Vac)
Nine (9) slots for 3300W Rectifiers and BBUs
208V Single Power System Output (Max) =
16.5kW (N+1)
277/480V Single Power System (Max) = 26.4kW
(N+1)
277/480V Dual Zone Configuration (Max) =
52.8kW (N+1)
▪ Redundant power options available
▪ Provides 3 Pair of DC Output Bus Bar Connections
▪ Temperature Env -10C to +45C (+14F to +113F)
▪ 2 AC Convenience Outlets for switching (Gen II)
Vertiv HPC Power Shelf (3 x 12V DC Bus Bar)
[Feb’18 Updated]
© 2019 Penguin Computing 7
CPU Processing
▪ Three Nodes in 1 OpenU Form Factor
▪ Dual Socket Intel®️ Xeon®️ E5-2600v4 per node
▪ Up to 1TB DDR4-2400MHz (8x DIMMs) per node
▪ Intel®️ C612 Chipset
▪ 1x Dedicated BMC
▪ 1x PCIe 3.0 x16
▪ 1x 2.5" Fixed SATA SSD
▪ Dual 1GbE/RJ45 LOM
▪ Optional 1x FDR LOM
▪ Supports Asetek Direct-to-Chip Cooling
Compute Node: Relion 1930e
© 2019 Penguin Computing 8
Compute Node: Relion XO1114GT
GPU Computing
▪ 1 OU Form Factor
▪ Dual Intel Xeon Cascade Lake-SP / Skylake-SP with
Intel Omni-Path
▪ Up to 2TB DDR4-2933MHz (16x DIMMs)
▪ Intel®️ Lewisburg Chipset
▪ 1x Dedicated BMC
▪ Supports 4x GPGPU
▪ Nvidia Tesla Volta-PCIe
▪ 2x PCIe 3.0 Low profile (Speed depends on topology)
▪ 4x 2.5" SATA SSD
▪ Dual 1GbE/RJ45 LOM
▪ Support Asetek Direct-to-Chip Cooling (CPUs and GPUs)
▪ Flexible PCIe Topology
© 2019 Penguin Computing 9
CTS-1 Delivered Systems Through Spring 2019
© 2019 Penguin Computing
Flexible Architecture Enables Scalable Configurations
10
Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path
▪ LLNL “Quartz” - orig 14SU, expanding 2SU in 2019. TTL will be 16SU.
▪ LLNL “Jade” - 14SU
▪ LANL “Grizzly” w/ “Snow” is 10SU
▪ SNL “Serrano” and “Cayenne” - separate 6SU
▪ SNL “Eclipse” - base 6SU purchased in 2017, 2SU expanded in 2018
▪ LANL “Badger” - purchased in 2017, doubled in 2018
▪ LANL “Kodiak” (GPU cluster) - purchased in 2017, expanded in 2018
▪ LANL “Fire”, “Ice” and “Cyclone” - all separate 6SU
© 2019 Penguin Computing
OCP for AI
11
NEW Tundra-based “Corona” AI Cluster leverages flexibility, density,
efficiency of OCP
▪ AMD EPYC™ processors, AMD Radeon™ Instinct™ GPU accelerators
▪ 383 teraFLOPS (floating point operations per second)
▪ 170 two-socket nodes incorporating 24-core AMD EPYC™ 7401
processors and a PCIe 1.6 Terabyte (TB) nonvolatile (solid-state)
memory device
▪ Half of compute nodes utilize 4 AMD Radeon Instinct™ MI25 GPUs per
node, delivering 4.2 petaFLOPS of FP32 peak performance
▪ Connected via a Mellanox HDR 200 Gigabit InfiniBand network
▪ Remaining compute nodes may be upgraded with future GPUs
© 2019 Penguin Computing 12
Questions?
www.penguincomputing.com
1-888-PENGUIN
1 of 13

Recommended

LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March... by
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...Linaro
2.4K views34 slides
Supermicro High Performance Enterprise Hadoop Infrastructure by
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructuretempledf
1K views14 slides
Deploying Containers at Scale on OpenStack by
Deploying Containers at Scale on OpenStackDeploying Containers at Scale on OpenStack
Deploying Containers at Scale on OpenStackStephen Gordon
656 views34 slides
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ... by
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...Ceph Community
573 views24 slides
Infrastructure optimization for seismic processing (eng) by
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Vsevolod Shabad
1.6K views38 slides
MyCloud for $100k by
MyCloud for $100kMyCloud for $100k
MyCloud for $100kSebastien Goasguen
754 views20 slides

More Related Content

What's hot

Energy Saving ARM Server Cluster Born for Distributed Storage & Computing by
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingEnergy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingAaron Joue
1.8K views16 slides
Software Defined Storage Appliance Power by ARM based Microserver by
Software Defined Storage Appliance Power by ARM based MicroserverSoftware Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based MicroserverAaron Joue
306 views17 slides
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology by
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyAaron Joue
4.7K views22 slides
POWER9 for AI & HPC by
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPCinside-BigData.com
1.6K views17 slides
Red Hat Summit 2018 5 New High Performance Features in OpenShift by
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
1.1K views43 slides
@IBM Power roadmap 8 by
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8 Diego Alberto Tamayo
838 views19 slides

What's hot(17)

Energy Saving ARM Server Cluster Born for Distributed Storage & Computing by Aaron Joue
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingEnergy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & Computing
Aaron Joue1.8K views
Software Defined Storage Appliance Power by ARM based Microserver by Aaron Joue
Software Defined Storage Appliance Power by ARM based MicroserverSoftware Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based Microserver
Aaron Joue306 views
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology by Aaron Joue
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
Aaron Joue4.7K views
Red Hat Summit 2018 5 New High Performance Features in OpenShift by Jeremy Eder
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Jeremy Eder1.1K views
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift by Jeremy Eder
NVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShiftNVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShift
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift
Jeremy Eder1.4K views
Benchmarking your cloud performance with top 4 global public clouds by data://disrupted®
Benchmarking your cloud performance with top 4 global public cloudsBenchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public clouds
data://disrupted®111 views
AMD Bridges the X86 and ARM Ecosystems for the Data Center by AMD
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD8.9K views
Propelling IoT Innovation with Predictive Analytics by SingleStore
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive Analytics
SingleStore314 views
Red Hat Storage Day Dallas - Why Software-defined Storage Matters by Red_Hat_Storage
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red_Hat_Storage454 views
PowerEdge Rack and Tower Server Masters AMD Processors.pptx by NeoKenj
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
NeoKenj36 views
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift by Jeremy Eder
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShiftTriangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Jeremy Eder885 views

Similar to How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500

Evolution of Supermicro GPU Server Solution by
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
1.4K views17 slides
PLNOG 13: Maciej Grabowski: HP Moonshot by
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPROIDEA
480 views31 slides
Consumption Based On-Demand Private Cloud in a Box by
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
187 views33 slides
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC by
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
884 views26 slides
Pro Tips: Building for Hyperscale by
Pro Tips: Building for HyperscalePro Tips: Building for Hyperscale
Pro Tips: Building for HyperscalePenguin Computing
88 views11 slides
Microsoft Project Olympus AI Accelerator Chassis (HGX-1) by
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
3.6K views28 slides

Similar to How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500(20)

Evolution of Supermicro GPU Server Solution by NVIDIA Taiwan
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan1.4K views
PLNOG 13: Maciej Grabowski: HP Moonshot by PROIDEA
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP Moonshot
PROIDEA480 views
Consumption Based On-Demand Private Cloud in a Box by Rebekah Rodriguez
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
Rebekah Rodriguez187 views
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC by inside-BigData.com
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
inside-BigData.com884 views
Microsoft Project Olympus AI Accelerator Chassis (HGX-1) by inside-BigData.com
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
inside-BigData.com3.6K views
TechWiseTV Workshop: Cisco UCS C4200 by Robb Boyd
TechWiseTV Workshop: Cisco UCS C4200TechWiseTV Workshop: Cisco UCS C4200
TechWiseTV Workshop: Cisco UCS C4200
Robb Boyd413 views
IBM Power9 Features and Specifications by inside-BigData.com
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
inside-BigData.com11.2K views
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a... by Embarcados
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Embarcados325 views
MIG 5th Data Centre Summit 2016 PTS Presentation v1 by blewington
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
blewington134 views
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems by Rebekah Rodriguez
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
Rebekah Rodriguez303 views
Blue line Supermicro Server Building Block Solutions by Blue Line
Blue line Supermicro Server Building Block SolutionsBlue line Supermicro Server Building Block Solutions
Blue line Supermicro Server Building Block Solutions
Blue Line2.7K views
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic... by Netronome
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome1.5K views
Expectations for optical network from the viewpoint of system software research by Ryousei Takano
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
Ryousei Takano1.8K views
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server by Rebekah Rodriguez
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez252 views
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger by inside-BigData.com
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com1.2K views

More from Penguin Computing

Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions by
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPenguin Computing
113 views14 slides
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud by
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPenguin Computing
128 views18 slides
Partner Perspectives: The OCP Community by
Partner Perspectives: The OCP CommunityPartner Perspectives: The OCP Community
Partner Perspectives: The OCP CommunityPenguin Computing
57 views15 slides
Ocp recommended profiles for next generation OCP Racks by
Ocp recommended profiles for next generation OCP RacksOcp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP RacksPenguin Computing
60 views12 slides
Penguin computing designing and deploying end to end HPC and AI Solutions by
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin Computing
70 views14 slides
Ocp updating the ocp compute voltage step response specification by
Ocp  updating the ocp compute voltage step response specificationOcp  updating the ocp compute voltage step response specification
Ocp updating the ocp compute voltage step response specificationPenguin Computing
47 views13 slides

More from Penguin Computing(7)

Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions by Penguin Computing
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Penguin Computing113 views
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud by Penguin Computing
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
Penguin Computing128 views
Ocp recommended profiles for next generation OCP Racks by Penguin Computing
Ocp recommended profiles for next generation OCP RacksOcp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP Racks
Penguin computing designing and deploying end to end HPC and AI Solutions by Penguin Computing
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI Solutions
Ocp updating the ocp compute voltage step response specification by Penguin Computing
Ocp  updating the ocp compute voltage step response specificationOcp  updating the ocp compute voltage step response specification
Ocp updating the ocp compute voltage step response specification
Ocp recommended profiles for next generation ocp racks by Penguin Computing
Ocp recommended profiles for next generation ocp racksOcp recommended profiles for next generation ocp racks
Ocp recommended profiles for next generation ocp racks

Recently uploaded

GDSC CTU First Meeting Party by
GDSC CTU First Meeting PartyGDSC CTU First Meeting Party
GDSC CTU First Meeting PartyNational Yang Ming Chiao Tung University
11 views25 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin by
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
17 views38 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
17 views3 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
126 views32 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
22 views23 slides
The Research Portal of Catalonia: Growing more (information) & more (services) by
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
115 views25 slides

Recently uploaded(20)

"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays17 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana17 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec15 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab23 views

How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500

  • 1. How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500 Sid Mair– OCP 2019
  • 2. © 2019 Penguin Computing About Penguin Computing 2 ▪ Specialized high-performance computing (HPC), bare metal HPC in the cloud, AI, and storage technologies ▪ Coupled with leading-edge design, implementation, hosting, and managed services including sys-admin and storage-as-a-service, and highly rated customer support ▪ More than 2,500 customers in 50 countries across nine major vertical markets ▪ Over 300 OCP racks delivered to date based on Tundra™ Extreme Scale Design ▪ 20 years of artificial intelligence (AI), engineering, and computer science for startups, Fortune 500, government, and academic organizations
  • 3. © 2019 Penguin Computing OCP Clusters in Top500 3 Sixteen supercomputers designed and built by Penguin Computing placed on the TOP500 List since 2016 ▪ All OCP based & deployed in U.S. national labs as part of the U.S. Department of Energy as an alternative to explosive test-based confidence ▪ Part of the DOE’s Advanced Simulation and Computing (ASC) program ▪ Provide simulation-based confidence in the nuclear stockpile, an alternative to explosive test-based confidence
  • 4. © 2019 Penguin Computing CTS-1: A Perfect Opportunity for OCP Design 4 ▪ Supports National Nuclear Security Administration (NNSA) to ensure nuclear stockpile stewardship in compliance with the Comprehensive Nuclear-Test- Ban Treaty (CTBT) between the U.S. and the former Soviet Union ▪ 30,000 Broadwell / Skylake dual processor nodes to date ▪ Commodity clusters brought down the cost of HPC systems from approximately $100 million per teraFLOP in 1995 to less than $5,000 per teraFLOP today (factor of 20,000) with greater computing power and energy efficiency with each generation ▪ Exemplifies how OCP-based technology can give organizations both value and performance ▪ Provides flexibility in CPU Architectures, accelerators, and interconnects
  • 5. © 2019 Penguin Computing The Key -- Uniquely Flexible, Dense Tundra Design 5 Baseline Tundra™ Extreme Scale ▪ First was 1OU, 3 node, CPU compute, housed in v1 rack ▪ Supports up to 102 nodes per rack with switching ▪ May include GPGPU accelerated servers ▪ High-speed, low latency interconnects Workloads synchronized within microseconds between nodes ▪ Air or liquid cooling (Direct-to-Chip or Rear Doors) ▪ Latest generation open bridge rack ▪ Accessible via cloud through Penguin Computing On- Demand®️ (POD) ▪ Multiple storage options
  • 6. © 2019 Penguin Computing 6 ▪ Shelf Dimensions: 3U (H) x 19in (W) x 25.8in (D) ▪ Many voltage options available (176-305Vac) Nine (9) slots for 3300W Rectifiers and BBUs 208V Single Power System Output (Max) = 16.5kW (N+1) 277/480V Single Power System (Max) = 26.4kW (N+1) 277/480V Dual Zone Configuration (Max) = 52.8kW (N+1) ▪ Redundant power options available ▪ Provides 3 Pair of DC Output Bus Bar Connections ▪ Temperature Env -10C to +45C (+14F to +113F) ▪ 2 AC Convenience Outlets for switching (Gen II) Vertiv HPC Power Shelf (3 x 12V DC Bus Bar) [Feb’18 Updated]
  • 7. © 2019 Penguin Computing 7 CPU Processing ▪ Three Nodes in 1 OpenU Form Factor ▪ Dual Socket Intel®️ Xeon®️ E5-2600v4 per node ▪ Up to 1TB DDR4-2400MHz (8x DIMMs) per node ▪ Intel®️ C612 Chipset ▪ 1x Dedicated BMC ▪ 1x PCIe 3.0 x16 ▪ 1x 2.5" Fixed SATA SSD ▪ Dual 1GbE/RJ45 LOM ▪ Optional 1x FDR LOM ▪ Supports Asetek Direct-to-Chip Cooling Compute Node: Relion 1930e
  • 8. © 2019 Penguin Computing 8 Compute Node: Relion XO1114GT GPU Computing ▪ 1 OU Form Factor ▪ Dual Intel Xeon Cascade Lake-SP / Skylake-SP with Intel Omni-Path ▪ Up to 2TB DDR4-2933MHz (16x DIMMs) ▪ Intel®️ Lewisburg Chipset ▪ 1x Dedicated BMC ▪ Supports 4x GPGPU ▪ Nvidia Tesla Volta-PCIe ▪ 2x PCIe 3.0 Low profile (Speed depends on topology) ▪ 4x 2.5" SATA SSD ▪ Dual 1GbE/RJ45 LOM ▪ Support Asetek Direct-to-Chip Cooling (CPUs and GPUs) ▪ Flexible PCIe Topology
  • 9. © 2019 Penguin Computing 9 CTS-1 Delivered Systems Through Spring 2019
  • 10. © 2019 Penguin Computing Flexible Architecture Enables Scalable Configurations 10 Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path ▪ LLNL “Quartz” - orig 14SU, expanding 2SU in 2019. TTL will be 16SU. ▪ LLNL “Jade” - 14SU ▪ LANL “Grizzly” w/ “Snow” is 10SU ▪ SNL “Serrano” and “Cayenne” - separate 6SU ▪ SNL “Eclipse” - base 6SU purchased in 2017, 2SU expanded in 2018 ▪ LANL “Badger” - purchased in 2017, doubled in 2018 ▪ LANL “Kodiak” (GPU cluster) - purchased in 2017, expanded in 2018 ▪ LANL “Fire”, “Ice” and “Cyclone” - all separate 6SU
  • 11. © 2019 Penguin Computing OCP for AI 11 NEW Tundra-based “Corona” AI Cluster leverages flexibility, density, efficiency of OCP ▪ AMD EPYC™ processors, AMD Radeon™ Instinct™ GPU accelerators ▪ 383 teraFLOPS (floating point operations per second) ▪ 170 two-socket nodes incorporating 24-core AMD EPYC™ 7401 processors and a PCIe 1.6 Terabyte (TB) nonvolatile (solid-state) memory device ▪ Half of compute nodes utilize 4 AMD Radeon Instinct™ MI25 GPUs per node, delivering 4.2 petaFLOPS of FP32 peak performance ▪ Connected via a Mellanox HDR 200 Gigabit InfiniBand network ▪ Remaining compute nodes may be upgraded with future GPUs
  • 12. © 2019 Penguin Computing 12 Questions?