SlideShare a Scribd company logo
SUMMIT
SUPERCOMPUTER
Supervisor: Dr. R. Venkatesan
Presentation by: Vigneshwar Ramaswamy
Masc. in Computer Engineering
MUN ID: 201990029
Memorial University of Newfoundland, Canada
Summit Supercomputer Architecture 1
Outline
• Introduction
• Summit Overview
• Specification of Summit
• IBM Power9 Architecture
• NVIDIA Tesla V100 Architecture
• Interconnect
• Application
Summit Supercomputer Architecture 2
Introduction
• Summit was the fastest computer in the world from November 2018 to June 2020.
• 2nd Rank on TOP500 peak speed 148.6 pflops ( High Performance Linpack benchmark).
• 8th Rank on Green500 with power efficiency of 14.719 Gflops/watt.
• As of June 2018 – 2020, the summit topped HPCG benchmark used by 5 out of 6
Gordon Bell Finalist teams.
• Summit has Achieved to reach exa operations per second (exaop), achieving 1.88
exaops during a Genmoic Analysis and expected to reach 3.3 exaops using mixed
precision calculations.
Summit Supercomputer Architecture 3
Summit Overview and Specifications
• Processor: IBM POWER9™ (2/node)
• GPUs: 27,648 NVIDIA Volta V100s (6/node)
• Theoretical Peak (Rpeak) performance :200 Pflops
• Linpack performance :-148.6 PFlops.
• It has 2,414,592 cores
• 250petabytes storage capacity
• Nodes: 4,608
• Memory/ each node: 512GB DDR4 + 96GB HBM2 (1/2TF,CPU-GPU accessing)
• NV Memory/node: 1600GB
• Total System Memory: >10PB DDR4 + HBM + Non-volatile
Summit Supercomputer Architecture 4
Summit Overview and Specifications
• Interconnect Topology: Mellanox EDR 100G InfiniBand,Non-blocking Fat Tree
• 25gigabytes per second between nodes
• In-Network Computing acceleration for communications frameworks such as
MPI(Message Passing Interface).
• Peak Power Consumption: 13MW
• Operating system :Red Hat Enterprise Linux (RHEL) version 7.4.
Summit Supercomputer Architecture 5
Summit Nodes
Summit Supercomputer Architecture 6
FIGURE 1: SUMMIT NODE BLOCK DIAGRAM
SOURCE: Summit, Oak Ridge National Laboratory (official web page), https://www.olcf.ornl.gov/summit/
IBM POWER9 Processor
• Summit’s POWER9 processor contain 24 active
cores (4 hardware threads/core).
• Peripheral component interconnect express
(PCI – Express) Gen4.
• NVLink 2.0
• 14nm finFET Semiconductor Process with 8.0
billion transistors
• High Bandwidth Signaling Technology
• 16Gb/s interface – Local SMP
• 25 Gb/s interface – 25G Link – Accelerator,
remote SMP
Summit Supercomputer Architecture 7
FIGURE 2: POWER9 ARCHITECTURE
SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9
Processor Architecture," in IEEE Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr.
2017.doi: 10.1109/MM.2017.40
Core pipeline
• Microarchitecture has Reduced pipeline
length.
• Removes the instruction grouping
technique .
• Introduces new features to proactively
avoid hazards in the load store unit (LSU)
and improve the LSU’s execution efficiency.
• Complete up to 128 instruction per
cycle.(SMT 4)
• New lock management control improves
the performance
Summit Supercomputer Architecture 8
FIGURE 3: POWER9 VS POWER8 PIPELINE STAGES
SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9
Processor Architecture," in IEEE Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr.
2017.doi: 10.1109/MM.2017.40
Key components of Power9 core
Summit Supercomputer Architecture 9
Figure 4: SMT4 Core Figure 5: SMT8 Core
Figure 6: Power9 SMT4 core. The detailed core block diagram
shows all the key components of the Power9 core.
Cache Capacity of
POWER9 Processor
• L1I: 32 KiB (per core, 8-way set associative)
• L1D: 32 KiB (per core, 8-way)
• L2: 512 KiB (per pair of cores)
• L3: 120 MiB eDRAM, 20-way
Summit Supercomputer Architecture 10
FIGURE 7: SMT8 Cache
SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9 Processor Architecture," in IEEE
Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr. 2017.doi: 10.1109/MM.2017.40
NVDIA Tesla V100
GPU Architecture
• This GPU is built with 21 billion transistors
• It has peak performance of 7.8 TFLOP/s of
double precision floating point performance
(FP64)
• It has 15.7 TFLOP/s of single precision
performance(FP32).
• It has 5376 FP32 cores, 5376 INT32 cores,
2688 FP64 cores, 672 Tensor cores, 366
Texture units.
• (8) 512-bit memory controllers control
access to the 16 GB of HBM2 memory.
• 6 MB L2 cache that is available to the SMs
• NVIDIA’s NVLink interconnect to pass data
between GPUs as well as from CPU-to-GPU
Summit Supercomputer Architecture 11
FIGURE 8: NVIDIA TESLA V100 GPU ARCHITECTURE
SOURCE: NVIDIA TESLA V100 GPU Architecture, White paper,
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-
whitepaper.pdf
Volta Streaming Multiprocessor
• This new Streaming Multiprocessor architecture delivers major improvements in
performance and energy efficiency.
• New mixed precision tensor Cores.
• 50% higher efficiency on general computation workloads.
• High performance L1 data cache.
• V100 SM has 64 FP32 cores and 32 FP64 cores per SM.
• Supports more threads, warps, and thread blocks when compared to prior GPU
generations
• A 128-KB combined memory block for shared memory and L1 cache can be
configured to allow up to 96 KB of shared memory.
• Each SM has four texture units which use to set the size of the L1 cache.
Summit Supercomputer Architecture 12
FIGURE 9: VOLTA GV100 Streaming
Multiprocessor (SM)
SOURCE: NVIDIA TESLA V100 GPU Architecture,
White paper, https://images.nvidia.com/content/volta-
architecture/pdf/volta-architecture-whitepaper.pdf
Tensor Cores
• V100 GPU contains 640 Tensor Cores: eight
(8) per SM and two (2) per each processing
block (partition) within an SM.
• Each Tensor Cores performs 64 FP
FMA(fused multiplication and addition)
operations per clock.
• For deep learning training ,Tensor Cores
provide up to 12x higher peak TFLOPS on
Tesla V100 compared to pascal.
• For deep learning inference, Tensor Cores
provide up to 6x higher peak TFLOPS on
Tesla V100 when ompared to pascal.
Summit Supercomputer Architecture 13
FIGURE 10: Pascal and Volta 4 x 4 matrix multiplication
SOURCE: NVIDIA TESLA V100 GPU Architecture, White paper,
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-
whitepaper.pdf
Tensor cores
• Each Tensor Core operates on a 4x4 matrix and performs
the following operation:
• D = A×B + C, where A, B, C, and D are 4x4 matrices.
• Each FP16 multiply gives a full-precision product which is
accumulated in a FP32 addition to provide the result.
Summit Supercomputer Architecture 14
FIGURE 11: Tensor Core 4 x 4 Matrix Multiply and
accumulate
FIGURE 12: Mixed Precision Multiply and Accumulate in
Tensor core
Performance of Tensor Cores on Matrix
Multiplications
Summit Supercomputer Architecture 15
FIGURE 13: Single precision (FP32) FIGURE 14: Mixed precision
NVIDIA NVLink
• In Summit Supercomputer, the
Tesla V100 accelerators and
Power9 CPUs are connected with
NVLink.
• More performance when
compared to PCLe interconnects.
• Each link provides 25
Gigabytes/second in each
direction.
Summit Supercomputer Architecture 16
FIGURE 15: NVDIA NVLink
Interconnect
• Nodes are connected with Mellanox dual rail EDR InfiniBand network.
• Each node it gives 25 GB/s Bandwidth .
• Using dual-rail Mellanox EDR(Enhanced Data Rate) 100Gb/s InfiniBand interconnect for both
storage and inter-process communications traffic
• All nodes are interconnected with Non-Blocking Fat Tree topology.
• Implemented by three level tree.
Summit Supercomputer Architecture 17
FIGURE 16: ConnectX-5adapterandinterface
withPOWER9 chips
FIGURE 17: Fat Tree Topology
Application- Finding the Drug Compounds to fight against the
corona virus
• Summit was used to screen through a library of 8000 datasets of known FDA approved drug compounds to
fight against the corona virus.
• Narrowed down the dataset to 77 in just 2 days.
• Summit uses Virus genome to search for a very specific type of drug compounds.
• On comparing with the world’s fastest computer Fugaku, which was used to conduct molecule level
simulations.
• narrowed from 2128 existing drugs and picked 12 drugs that bond easily to the proteins in 10 days.
• Fugaku can perform more than 415 quadrillion computations a second which is 2.8 times faster than summit.
Summit Supercomputer Architecture 18
Comparison with other Supercomputers
Summit Supercomputer Architecture 19
Rank Rmax Name Model Processor Cores Interconnect Memory Manufact
urer
Operating
system
Rpeak
(PFLOPS)
1 415.530 FUGAKU SUPERCOMPUTER
FUGAKU
A64FX 48C 2.2GHz 7,299,072
Tofu interconnect D
4,866,048 GB Fujitsu Red Hat Enterprise
Linux
513.855
2 148.6 SUMMIT IBM POWER
SYSTEM AC922
IBM POWER9 22C
3.07GHz
2,414,592 Dual-rail Mellanox EDR
Infiniband
2,801,664 GB IBM RHEL 7.4
200.795
3 94.640 SIERRA IBM POWER
SYSTEM AC922
IBM POWER9 22C
3.07GHz
1,572,480 Dual-rail Mellanox EDR
Infiniband
1,382,400 GB IBM RHEL 7.4
125.712
4 93.014 SUNWAY
TAIHULIGHT
SUNWAY MPP SUNWAY
SW26010 260C
1.45 GHZ
10,649,600 Sunway 1,310,720 GB NRCPC Sunway RaiseOS
2.0.5
125.436
Supercomputers
development
over the past 27
years
Summit Supercomputer Architecture 20
CM-5 Supercomputer
Fugaku Supercomputer Sunway Taihu Light
Summit Supercomputer
•Thank you
Summit Supercomputer Architecture 21

More Related Content

What's hot

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
butest
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
Preferred Networks
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral ThesisDr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
Fritz Lang
 
Support Vector Machine (SVM)
Support Vector Machine (SVM)Support Vector Machine (SVM)
Support Vector Machine (SVM)
Sana Rahim
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
ankit_ppt
 
NUMPY
NUMPY NUMPY
Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
Khanh Le
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
Thomas da Silva Paula
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
surbhidutta4
 
AMD Ryzen Pro
AMD Ryzen ProAMD Ryzen Pro
AMD Ryzen Pro
Low Hong Chuan
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
Alison B. Lowndes
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 
Super computer
Super computerSuper computer
Super computer
Saliya Rathnasiri
 
Python Intro
Python IntroPython Intro
Python Intro
Tim Penhey
 
Super computers
Super computersSuper computers
Super computers
mannatsidhu1
 
強化學習 Reinforcement Learning
強化學習 Reinforcement Learning強化學習 Reinforcement Learning
強化學習 Reinforcement Learning
Yen-lung Tsai
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
Unai Lopez-Novoa
 

What's hot (20)

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral ThesisDr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
Dr. iur.Pedro Bejarano Alomia, LL.M. - Neonaticide - Doctoral Thesis
 
Support Vector Machine (SVM)
Support Vector Machine (SVM)Support Vector Machine (SVM)
Support Vector Machine (SVM)
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
NUMPY
NUMPY NUMPY
NUMPY
 
Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
AMD Ryzen Pro
AMD Ryzen ProAMD Ryzen Pro
AMD Ryzen Pro
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Super computer
Super computerSuper computer
Super computer
 
Python Intro
Python IntroPython Intro
Python Intro
 
Super computers
Super computersSuper computers
Super computers
 
強化學習 Reinforcement Learning
強化學習 Reinforcement Learning強化學習 Reinforcement Learning
強化學習 Reinforcement Learning
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 

Similar to Hardware architecture of Summit Supercomputer

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
inside-BigData.com
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
zaid_b
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
Sx 6-single-node
Sx 6-single-nodeSx 6-single-node
Sx 6-single-node
Léia de Sousa
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
Slide_N
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 
uCluster
uClusteruCluster
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
lambertt
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Michelle Holley
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
BeGooden-IT Consulting
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV
 

Similar to Hardware architecture of Summit Supercomputer (20)

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
Sx 6-single-node
Sx 6-single-nodeSx 6-single-node
Sx 6-single-node
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
uCluster
uClusteruCluster
uCluster
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 

Recently uploaded

Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
sachin chaurasia
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 

Recently uploaded (20)

Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 

Hardware architecture of Summit Supercomputer

  • 1. SUMMIT SUPERCOMPUTER Supervisor: Dr. R. Venkatesan Presentation by: Vigneshwar Ramaswamy Masc. in Computer Engineering MUN ID: 201990029 Memorial University of Newfoundland, Canada Summit Supercomputer Architecture 1
  • 2. Outline • Introduction • Summit Overview • Specification of Summit • IBM Power9 Architecture • NVIDIA Tesla V100 Architecture • Interconnect • Application Summit Supercomputer Architecture 2
  • 3. Introduction • Summit was the fastest computer in the world from November 2018 to June 2020. • 2nd Rank on TOP500 peak speed 148.6 pflops ( High Performance Linpack benchmark). • 8th Rank on Green500 with power efficiency of 14.719 Gflops/watt. • As of June 2018 – 2020, the summit topped HPCG benchmark used by 5 out of 6 Gordon Bell Finalist teams. • Summit has Achieved to reach exa operations per second (exaop), achieving 1.88 exaops during a Genmoic Analysis and expected to reach 3.3 exaops using mixed precision calculations. Summit Supercomputer Architecture 3
  • 4. Summit Overview and Specifications • Processor: IBM POWER9™ (2/node) • GPUs: 27,648 NVIDIA Volta V100s (6/node) • Theoretical Peak (Rpeak) performance :200 Pflops • Linpack performance :-148.6 PFlops. • It has 2,414,592 cores • 250petabytes storage capacity • Nodes: 4,608 • Memory/ each node: 512GB DDR4 + 96GB HBM2 (1/2TF,CPU-GPU accessing) • NV Memory/node: 1600GB • Total System Memory: >10PB DDR4 + HBM + Non-volatile Summit Supercomputer Architecture 4
  • 5. Summit Overview and Specifications • Interconnect Topology: Mellanox EDR 100G InfiniBand,Non-blocking Fat Tree • 25gigabytes per second between nodes • In-Network Computing acceleration for communications frameworks such as MPI(Message Passing Interface). • Peak Power Consumption: 13MW • Operating system :Red Hat Enterprise Linux (RHEL) version 7.4. Summit Supercomputer Architecture 5
  • 6. Summit Nodes Summit Supercomputer Architecture 6 FIGURE 1: SUMMIT NODE BLOCK DIAGRAM SOURCE: Summit, Oak Ridge National Laboratory (official web page), https://www.olcf.ornl.gov/summit/
  • 7. IBM POWER9 Processor • Summit’s POWER9 processor contain 24 active cores (4 hardware threads/core). • Peripheral component interconnect express (PCI – Express) Gen4. • NVLink 2.0 • 14nm finFET Semiconductor Process with 8.0 billion transistors • High Bandwidth Signaling Technology • 16Gb/s interface – Local SMP • 25 Gb/s interface – 25G Link – Accelerator, remote SMP Summit Supercomputer Architecture 7 FIGURE 2: POWER9 ARCHITECTURE SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9 Processor Architecture," in IEEE Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr. 2017.doi: 10.1109/MM.2017.40
  • 8. Core pipeline • Microarchitecture has Reduced pipeline length. • Removes the instruction grouping technique . • Introduces new features to proactively avoid hazards in the load store unit (LSU) and improve the LSU’s execution efficiency. • Complete up to 128 instruction per cycle.(SMT 4) • New lock management control improves the performance Summit Supercomputer Architecture 8 FIGURE 3: POWER9 VS POWER8 PIPELINE STAGES SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9 Processor Architecture," in IEEE Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr. 2017.doi: 10.1109/MM.2017.40
  • 9. Key components of Power9 core Summit Supercomputer Architecture 9 Figure 4: SMT4 Core Figure 5: SMT8 Core Figure 6: Power9 SMT4 core. The detailed core block diagram shows all the key components of the Power9 core.
  • 10. Cache Capacity of POWER9 Processor • L1I: 32 KiB (per core, 8-way set associative) • L1D: 32 KiB (per core, 8-way) • L2: 512 KiB (per pair of cores) • L3: 120 MiB eDRAM, 20-way Summit Supercomputer Architecture 10 FIGURE 7: SMT8 Cache SOURCE: S. K. Sadasivam, B. W. Thompto, R. Kalla and W. J. Starke, "IBM Power9 Processor Architecture," in IEEE Micro, vol. 37, no. 2, pp. 40-51, Mar.-Apr. 2017.doi: 10.1109/MM.2017.40
  • 11. NVDIA Tesla V100 GPU Architecture • This GPU is built with 21 billion transistors • It has peak performance of 7.8 TFLOP/s of double precision floating point performance (FP64) • It has 15.7 TFLOP/s of single precision performance(FP32). • It has 5376 FP32 cores, 5376 INT32 cores, 2688 FP64 cores, 672 Tensor cores, 366 Texture units. • (8) 512-bit memory controllers control access to the 16 GB of HBM2 memory. • 6 MB L2 cache that is available to the SMs • NVIDIA’s NVLink interconnect to pass data between GPUs as well as from CPU-to-GPU Summit Supercomputer Architecture 11 FIGURE 8: NVIDIA TESLA V100 GPU ARCHITECTURE SOURCE: NVIDIA TESLA V100 GPU Architecture, White paper, https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture- whitepaper.pdf
  • 12. Volta Streaming Multiprocessor • This new Streaming Multiprocessor architecture delivers major improvements in performance and energy efficiency. • New mixed precision tensor Cores. • 50% higher efficiency on general computation workloads. • High performance L1 data cache. • V100 SM has 64 FP32 cores and 32 FP64 cores per SM. • Supports more threads, warps, and thread blocks when compared to prior GPU generations • A 128-KB combined memory block for shared memory and L1 cache can be configured to allow up to 96 KB of shared memory. • Each SM has four texture units which use to set the size of the L1 cache. Summit Supercomputer Architecture 12 FIGURE 9: VOLTA GV100 Streaming Multiprocessor (SM) SOURCE: NVIDIA TESLA V100 GPU Architecture, White paper, https://images.nvidia.com/content/volta- architecture/pdf/volta-architecture-whitepaper.pdf
  • 13. Tensor Cores • V100 GPU contains 640 Tensor Cores: eight (8) per SM and two (2) per each processing block (partition) within an SM. • Each Tensor Cores performs 64 FP FMA(fused multiplication and addition) operations per clock. • For deep learning training ,Tensor Cores provide up to 12x higher peak TFLOPS on Tesla V100 compared to pascal. • For deep learning inference, Tensor Cores provide up to 6x higher peak TFLOPS on Tesla V100 when ompared to pascal. Summit Supercomputer Architecture 13 FIGURE 10: Pascal and Volta 4 x 4 matrix multiplication SOURCE: NVIDIA TESLA V100 GPU Architecture, White paper, https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture- whitepaper.pdf
  • 14. Tensor cores • Each Tensor Core operates on a 4x4 matrix and performs the following operation: • D = A×B + C, where A, B, C, and D are 4x4 matrices. • Each FP16 multiply gives a full-precision product which is accumulated in a FP32 addition to provide the result. Summit Supercomputer Architecture 14 FIGURE 11: Tensor Core 4 x 4 Matrix Multiply and accumulate FIGURE 12: Mixed Precision Multiply and Accumulate in Tensor core
  • 15. Performance of Tensor Cores on Matrix Multiplications Summit Supercomputer Architecture 15 FIGURE 13: Single precision (FP32) FIGURE 14: Mixed precision
  • 16. NVIDIA NVLink • In Summit Supercomputer, the Tesla V100 accelerators and Power9 CPUs are connected with NVLink. • More performance when compared to PCLe interconnects. • Each link provides 25 Gigabytes/second in each direction. Summit Supercomputer Architecture 16 FIGURE 15: NVDIA NVLink
  • 17. Interconnect • Nodes are connected with Mellanox dual rail EDR InfiniBand network. • Each node it gives 25 GB/s Bandwidth . • Using dual-rail Mellanox EDR(Enhanced Data Rate) 100Gb/s InfiniBand interconnect for both storage and inter-process communications traffic • All nodes are interconnected with Non-Blocking Fat Tree topology. • Implemented by three level tree. Summit Supercomputer Architecture 17 FIGURE 16: ConnectX-5adapterandinterface withPOWER9 chips FIGURE 17: Fat Tree Topology
  • 18. Application- Finding the Drug Compounds to fight against the corona virus • Summit was used to screen through a library of 8000 datasets of known FDA approved drug compounds to fight against the corona virus. • Narrowed down the dataset to 77 in just 2 days. • Summit uses Virus genome to search for a very specific type of drug compounds. • On comparing with the world’s fastest computer Fugaku, which was used to conduct molecule level simulations. • narrowed from 2128 existing drugs and picked 12 drugs that bond easily to the proteins in 10 days. • Fugaku can perform more than 415 quadrillion computations a second which is 2.8 times faster than summit. Summit Supercomputer Architecture 18
  • 19. Comparison with other Supercomputers Summit Supercomputer Architecture 19 Rank Rmax Name Model Processor Cores Interconnect Memory Manufact urer Operating system Rpeak (PFLOPS) 1 415.530 FUGAKU SUPERCOMPUTER FUGAKU A64FX 48C 2.2GHz 7,299,072 Tofu interconnect D 4,866,048 GB Fujitsu Red Hat Enterprise Linux 513.855 2 148.6 SUMMIT IBM POWER SYSTEM AC922 IBM POWER9 22C 3.07GHz 2,414,592 Dual-rail Mellanox EDR Infiniband 2,801,664 GB IBM RHEL 7.4 200.795 3 94.640 SIERRA IBM POWER SYSTEM AC922 IBM POWER9 22C 3.07GHz 1,572,480 Dual-rail Mellanox EDR Infiniband 1,382,400 GB IBM RHEL 7.4 125.712 4 93.014 SUNWAY TAIHULIGHT SUNWAY MPP SUNWAY SW26010 260C 1.45 GHZ 10,649,600 Sunway 1,310,720 GB NRCPC Sunway RaiseOS 2.0.5 125.436
  • 20. Supercomputers development over the past 27 years Summit Supercomputer Architecture 20 CM-5 Supercomputer Fugaku Supercomputer Sunway Taihu Light Summit Supercomputer