SlideShare a Scribd company logo
1 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Presenter Name:
Date:
Presented to:
2 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features,
functionality, performance, availability, timing and expected benefits of AMD Instinct™ accelerators; AMD Instinct™ MI300A
accelerators; AMD Instinct™ MI300X accelerators; AMD CDNA™ 3 processors; El Capitan; expected upstreaming backend
support for AMD GPUs with Open AI; AMD GPU architecture roadmaps; and expected AMD data center energy efficiencies,
which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-
looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends,"
"projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this
presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and
involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements
are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally
beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or
implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and
uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on
Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements
made in this presentation, except as may be required by law.​
AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation,
except as may be required by law.
3 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
SamsungExynos
MagicLeap
Tesla
PlayStation5
XboxSeriesX|S
SteamDeck
Radeon™RX7000Series
Radeon™W7000Series
Ryzen™7000 Series
AMDInstinct™MI250
AMDInstinct™MI210
Radeon™PROV620
AWS, MicrosoftAzure
Frontier(US)
LUMI(Finland/EU)
Adastra(France)
Setonix(Australia)
4 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
AGENCIES AND ACADEMIC
INSTITUTIONS
must speed achievement of better
lives, economies, communities
THE ENTERPRISE
must leverage HPC and AI
innovations quickly and profitably
RESEARCHERS
must solve the greatest
problems in history faster
5 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Power needs increase
exponentially with scale
Demand is high for speedy
outcomes
Architectures are rigid
• In 2008, DARPA set an exascale power goal of 20MW by
2015.1 That target has never been reached.
• OpenAI’s GPT-3 natural-language processing model can
produce 552 metric tons of carbon emissions2
• The growing volume of data is driving 5.6% yearly HPC
growth across sectors through 20273
• AI market growth will average 33.6% 2021–20284
• HPC systems’ typical lifecycle: 3–5 years5
• Firms who contribute to open source get up to 100% more
value from its use than those who don’t6
1 HPCwire, July 14, 2021; 2 Patterson, David and Gonzalez, Joseph and Le, Quoc and Liang, Chen and Munguia, Lluis-Miquel and Rothchild, Daniel and So, David and
Texier, Maud and Dean, Jeff, “Carbon Emissions and Large Neural Network Training” (Figure 4), arXiv, 2021, ©arXiv.org perpetual, non-exclusive license; 3 Blue Weave
Consulting Industry Trends & Forecast Report, Global High-Performance Computing (HPC) Market, Oct 2021; 4 Fortune Business Insights Industry Report, Artificial
Intelligence, Sept 2021; 5 AWS White Paper, “Challenging Barriers to HPC in the Cloud,” Oct 2019; 6 Nagle, Frank (2018), “Learning by Contributing: Gaining Competitive
Advantage Through Contribution to Crowdsourced Public Goods,” Organization Science 29(4):569-587
6 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
EFFICIENCY PLATFORM FLEXIBILITY VERSATILE PERFORMANCE CODE READINESS
Maximize performance
per watt to ease power
limitations
Adapt your application set
to your goals without
having to reinvest
Host a variety of
workloads simply within
the same architecture
Maintain device-level
optimization even as your
systems evolve
7 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
… you could redefine your potential for fast and
scalable results with breakthrough performance in
HPC and AI?
… you had a wide-ranging ecosystem of open software to tap
so you don’t always have to code down to the
subcomponent?
… your systems could scale and support many kinds of
workloads with streamlined programmability between the
CPU and GPU?
8 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
8
CSC
9 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
•
First to break Exascale barrier
•
7 of top 10 most efficient systems rely on AMD
•
9.95 Eflops on HPL-MxP Mixed-Precision Benchmark
TOP500, Green500, and HPL-AI lists, as of May 30,2022
Source: TOP500, Green500 and HPL-MxP lists, as of June 2023
10 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Source: TOP500, Green500, and HPL-MxP lists, as of June 2023
11 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Roadmaps Subject to Change
12 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
13 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
SUPERCOMPUTING
Exascale proven with leadership scaling,
performance, and efficiency in the market.*
 Leadership peak performance and
performance per Watt
 Memory Coherency, High Bandwidth
Memory, Low Latency GPU to GPU and CPU
to GPU interconnect
 Dedicated Software Team for joint
application porting, optimization and
scaling
• AMD Instinct MI250 vs Nvidia A100 SXM – See Endnotes MI200-01, MI200-18
COMMERCIAL
Attractive performance/$ across a broad
range of systems and applications
 Broad portfolio of servers and
configurations from leading OEM/ODM
 Exceptional HPL Performance with
MI250 and Ease Of Deployment with
MI210
 Turnkey software setup for key HPC
applications, ISV Software and AI
Frameworks
AI / ML
Outstanding solution for large ML model
training with MI250 and MI210 for Inferencing
 Leading FP16 performance*
 Highest memory density* to support large
ML models
 Ease of transition on leading ML
frameworks like PyTorch and TensorFlow
MI250 MI210
MI250X
MI250 MI210
14 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
15 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
FP64 VECTOR
FP32 VECTOR
FP64 MATRIX
FP32 MATRIX
FP16 MATRIX
MEMORY SIZE
PEAK MEMORY BANDWIDTH
NOTE: THE A100 TF32 DATA FORMAT IS NOT IEEE FP32 COMPLIANT , SO NOT INCLUDED IN THIS COMPARISON.
SEE ENDNOTES: MI200-01, MI200-07
16 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
CSC
GROMACS OpenMM LAMMPS NAMD AMBER Relion
Hoomd-Blue MILC NWChem OpenFOAM PETSc
PyTorch TensorFlow ONNX JAX
Ansys Mechanical Cascade Charles Altair AcuSolve
Tempoquest WRF/AceCast
2022
2023
VTKm HACC AMG Solve BDAS Quantum Espresso
VASP QMCPack Quicksilver Nekbone PENNANT
Kripke
Ansys Fluent Dassault XFlow Dassault CST STAR-CCM+
Devito Pro
A100 SXM (80GB) VS MI250 OAM (128GB)
LAMMPS (Reasxff, 4GPU), GROMACS (STMV, 4GPU), OPENMM (DP AMOEBAPME, 1GPU), NAMD3 (STMV NVE, 4GPU),
AMBER (CELLULOSE NVE, 1GPU), AMG SOLVE (4GPU), OPENFOAM (HPC MOTORBIKE, 4GPU), PYFR (NACA0021, 1GPU), HPCG (4GPU), HPL (4GPU)
See Endnotes MI200-69A, MI200-70A, MI200-73, MI200-74, MI200-75, MI200-76, MI200-77, MI200-79, MI200-82, MI200-90
~1.8x
~1.5x
~1.8x
~1.2x
~1.3x
~1.9x ~1.9x ~1.7x
~2.8x
~2.9x
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
JAX | DEEPSPEED | CUPY
MICROSOFT SUPERBENCH SINGLE GPU MODEL TRAINING MULTI-GPU (4 GPU) MODEL TRAINING
See Endnotes MI200-72, MI200-80
(400M) (336M) (66M) (355M) (560M) (124M) (1.3B) (770M)
18 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
AMD DATACENTER ENERGY
EFFICIENCY
2020-2025: AMD has a goal to deliver 30X
increase in energy efficiency for AMD
processors and accelerators in HPC and AI
training.
AMD Instinct accelerators power the #2
through #7 Green 500 systems.*
See Endnotes MI200-69A, MI200-81 *Learn More: www.amd.com/en/corporate-responsibility/data-center-sustainability
19 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
20 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
FP64
FP32
FP64 (TENSOR vs. MATRIX)
FP32 MATRIX
*THE A100 TF32 DATA FORMAT IS NOT IEEE FP32 COMPLIANT , SO NOT INCLUDED IN THIS COMPARISON.
65
32
65
151
75 75
FP64 Tensor FP64 Matrix FP64 FP32* FP32
Leadership Efficiency
(GFLOPS/Watt)
Leadership HPC Performance
(Peak TFLOPS)
ACCELERATOR
SEE ENDNOTES: MI200-41, MI200-44
21 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
~1.6x
~1.0x
~1.1x
~1.3x
~1.0x ~1.0x
1.6x
LAMMPS Gromacs OpenMM Quicksilver AMG Solve OpenFOAM HPL
A100 PCIe® (80GB) VS MI210 PCIe® (64GB)
LAMMPS (Reasxff, 8GPU), GROMACS (STMV, 8GPU), OPENMM (DP AMOEBAPME, 1GPU), AMG SOLVE (FOM_SOLV/SEC, 8GPU), QUICKSILVER (2GPU), HPL (8GPU),
OpenFOAM (HPC motorbike, 8GPU)
See Endnotes MI200-47A, MI200-85, MMI200-86, MI200-78, MI200-87, MI200-49A, MI200-91
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
~34.2
~61.8
~89.3
~36.1
~64.7
~82.7
Gromacs STMV
~6.8E+06
~1.4E+07
~2.6E+07
~7.0E+06
~1.4E+07
~2.7E+07
LAMMPS ReaxFF
~40.5
~80.7
~162.0
~40.9
~81.1
~159.7
HPL
ENTERPRISE SOLUTION WHEN SPACE AND POWER ARE AT A PREMIUM
See Endnotes MI200-88
MI250 LAMMPS ReaxFF 1GPU ~6791085.2, 2GPU ~13545169.9, 4GPU ~26329097.1
MI210 LAMMPS ReaxFF 2GPU ~7042464, 4GPU ~13795259.8, 8GPU ~27232174.1
23 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Roadmaps Subject to Change
Sampling
24 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Next-gen AI accelerator architecture
3D packaging with 4th Gen AMD
Infinity architecture
Optimized for performance
and power efficiency
Dedicated accelerator
engines for AI and HPC
25 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
World’sfirstAPUacceleratorforAIandHPC
AMD Instinct™ MI300A
Next-Gen
Accelerator
Architecture
128GB
HBM3
5nm and 6nm
Process Technology
SharedMemory
CPU + GPU
24 CPU
Cores
26 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
TECHNOLOGY PERFORMANCE EFFICENCY
▪ Worlds first data center APU
marrying the best of EPYC™ CPU
and AMD Instinct™ GPU
▪ Groundbreaking design and 3D
packaging designed for leadership
memory capacity, bandwidth and
reduced application latency
▪ Open, Robust, and Scalable
Software with AMD ROCm™
▪ Outstanding TCO, excellent HPC
performance, with ease of
programming
▪ CPU-GPU unified memory with
128GB HBM3 for optimized
performance and streamlined
programmability
▪ Unparalleled Compute Power
over prior generation with
estimated 8X AI Perf vs MI250X1
▪ APU Architecture Designed for
Power Savings Compared
to Discrete Implementation
▪ Unified Efficiency with “Zen 4” and
CDNA™ 3 for fine grained power
sharing for improved efficiency
▪ Estimating a 5X AI Performance
per watt improvement vs prior
generation MI250X2.
See Endnote 1MI300-03 and 2MI300-04. Preliminary data and projections, subject to change.
27 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
•
CPU and GPU cores share a unified on-package pool of memory
•
CPU | GPU | Cache | HBM
•
•
See Endnote MI300-03. Preliminary data and projections, subject to change.
28 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
▪
▪
▪
30 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Expected Late 2023
Content provided by LLNL
31 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
LeadershipgenerativeAIaccelerator
AMD Instinct™ MI300X
Introducingtoday
192GB
HBM3
5.2TB/s
Memory Bandwidth
896GB/s
Infinity Fabric™ Bandwidth
153B
Transistors
32 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
LeadershipgenerativeAIaccelerator
AMD Instinct™ MI300X
Introducingtoday
HBM bandwidth
1.6x
Upto
compared to Nvidia H100
HBM density
2.4x
Upto
compared to Nvidia H100
See Endnotes: MI300-05
33 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
40B 540B
Number
of
GPUs
175B 340B
PaLM2
Falcon
GPT-3
PaLM
Number of parameters
Competition 80GB
AMD Instinct™
MI300X 192GB
AMDInstinct™MI300X
Inferenceadvantage
See Endnote: MI300-08
34 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Enabling theecosystem requires
industry-standard, easily deployed solutions
35 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
AMD Instinct™
Platform
Introducing today
1.5 TB HBM3Memory Industry-StandardDesign
8x MI300X
36 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Sampling now
MI300A
Sampling Q3
MI300X
37 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
• FeaturescomparablewithNvidiaon widelyusedML/AI frameworks
• SuiteofoptimizedAIbenchmarksavailabletoshowcaseperformance
StartNowwithDrop-inreplacementfortopmachineleaningframeworks,
applications and benchmarks
• Exascale Proven – Performance, Scalable and Reliability
• World-class application team to develop, port and optimize apps
• Turnkey Software: key HPC open-source & ISV applications, leading
AI frameworks
• Robust & performant: software stack for HPC & AI
38 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
39 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
•
•
•
•
•
•
X
40 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Broad portfolio of
training and inference
compute engines
Deep ecosystem of
AI partners and
co-innovation
Open and proven
software capabilities
41 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Optimized AI software stack
AIModelsand Algorithms
Libraries
CompilersandTools
Runtime
AMDInstinct™ GPU
AIEcosystemoptimizedforAMD
Leadershipperformance
Aprovensoftwarestack
Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
42 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
BroadAI operator support
AMD ROCm™ 5: comprehensive
suite of data center optimizations
Out-of-the-box support for leading
models and frameworks
Creating an open and portable AI
ecosystem
Continuous framework integration
Unlocking the
AMD Instinct™ accelerator advantage
and more…
Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
43 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Source Container PIP Wheel
TensorFlow GitHub Infinity Hub pypi.org
PyTorch GitHub Infinity Hub pytorch.org
ONNX-RT GitHub Docker Instructions onnxruntime.ai
JAX GitHub public fork Docker Hub Est Q2 2023
DeepSpeed DeepSpeed GitHub Docker Hub deepspeed.ai
CuPy cupy.dev Docker Hub cupy.dev
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc.
PYTORCH FOUNDATION
FOUNDING MEMBER
44 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
T5 NLP with
11B parameters
1st Korean LLM
#11 Explorer
WUS3 running
AI and HPC workloads
Largest Finnish language
model (TurkuNLP-13B)
Allen Institute scientific LLM
#3 LUMI
National Cancer Institute
and DOE accelerating
cancer research
and treatment
#1 Frontier
PoweringdatacenterAIatscale
Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Kevin Scott Soumith Chintala Andrew Ng
Note: Highlights added by AMD
ClemDelangue
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Open Proven Ready
software approach AI capabilities support for AI models
47 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Life Science Physics Chemistry CFD Earth Science
AMBER
GROMACS
NAMD
LAMMPS
Hoomd-Blue
VASP
MILC
GRID
QUANTUM ESPRESSO
N-Body
CHROMA
PIConGPU
QuickSilver
CP2K
QUDA
NWCHEM
TERACHEM
QMCPACK
OpenFOAM
AMR-WIND
NEKBONE
LAGHOS
NEKO
NEKRS
PeleC
EXAGO
DEVITO
OCCA
SPECFEM3D-GLOBE
SPECFEM3D-CARTESIAN
ACECAST (WRF)
MPAS
ICON
Benchmarks
HPL
HPCG
AMG
ML - TORCHBENCH
ML - SUPERBENCH
Libraries
AMR-EX
Ginkko
HYPRE
TRILINOS
ML Frameworks
PYTORCH
TENSORFLOW
JAX
ONNX
OPENAI TRITON
ISV Applications
ANSYS MECHANICAL
CADENCE CHARLES
ANSYS FLUENT*
SIEMENS STAR-CCM+*
SIEMENS CALIBRE*
* Porting/optimization in progress
+ MANY MORE
STRONG MOMENTUM AND INCREASING LIST OF SUPORTED APPLICATION, LIBRARIES & FRAMEWORKS
48 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
WORLD-LEADING PROVIDER OF ADVANCED
SIMULATION & MODELING SOFTWARE
LEADING PROVIDER OF ACCELERATED MICROSCALE
WEATHER FORECAST SOFTWARE
Devito
AUTOMATIC CODE GENERATION EXECUTES OPTIMIZED
ACCELERATED SEISMIC INVERSION WORKLOADS
HIGH-FIDELITY CFD SOFTWARE BUILT TO TACKLE
ENGINEERING FLOW PROBLEMS
LEADING PROVIDER OF SCIENCE-BASED MULTISCALE,
MULTIPHYSICSMODELING AND SIMULATION APPLICATIONS
OPTIMIZED FOR HIGH-PERFORMANCE GPU COMPUTING
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
CODE CONVERSION
TOOLS
EXTEND YOURAPPLICATION
PLATFORM SUPPORTBY
CONVERTING CUDA® CODE
SINGLE SOURCE
MAINTAIN PORTABILITY
MAINTAIN PERFORMANCE
Hipify-perl
◢ Easy to use; point at a directory and it will hipify CUDA code
◢ Very simple string replacement technique; may require manual
post-processing
◢ Recommended for quick scans of projects
Hipify-clang
◢ More robust translation of the code
◢ Generates warnings and assistance for additional analysis
◢ High quality translation, particularly for cases where the user is
familiar with the make system
50 | AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
FEATURE PARITY WITH COMMONLY USED MATH AND COMMUNICATION LIBRARIES
NVIDIA AMD
Math Libraries
cuBLAS rocBLAS
cuBLASLt hipBLASLt
cuFFT rocFFT
cuSOLVER rocSOLVER
cuSPARSE rocSPARSE
cuRAND rocRAND
Communication Library NCCL RCCL
C++ Core library
Thrust rocThrust
CUB hipCUB
Wave Matrix Multiply Accumulate Library WMMA rocWMMA
Deep Learning / Machine Learning primitives cuDNN MIOpen
C++ templates abstraction for GEMMs CUTLASS Composable Kernel
AMD DEVELOPS EQUIVALENT OPTIMIZED LIBRARIES
TO MAXIMIZE PERFORMANCE OF COMMONLY USED HPC AND AI FUNCTIONS
51 | AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
LOW FRICTION SOFTWARE PORTING FOR EXISTING NVIDIA USERS TO AMD
ML Frameworks
Python
ML Kernel
Development
C++, Triton IR
ML Libraries
C++
DROP IN
Out-of-the-box
Support
PORT/OPTIMIZE
→ NVCC → → HIPIFY → HIPCC →
→ CUDA Triton Backend → → ROCM Triton Backend →
MIRROR
Equivalent
Libraries
rocBLAS, rocSparse, rocFFT, RCCL, MIOpen…
cuBLAS, cuSparse, cuFFT, NCCL, cuDNN…
DROP-IN
CUDA CUSTOM
KERNELS
CUDA CUSTOM
KERNELS
TRITON
KERNELS
TRITON
KERNELS
52 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
AMD Instinct™ MI200 SUPPORT
29 key applications & frameworks on Infinity Hub & a catalogue
supporting over 90 applications, frameworks & tools
PERFORMANCE RESULTS
Published Performance Results for Select Apps / Benchmarks
Accelerating Instinct™ accelerator adoption
Over 17,000 application pulls. 10,000+ since last year
AMD.com/InfinityHub
Instinct Application Catalog
AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Engage with AMD ROCm™ Platform Experts
Participate in AMD ROCm Webinar Series
Post questions, view FAQ’s in Community Forum
Get Started Using AMD ROCm
AMD ROCm Documentation on GitHub
Download the Latest Version of AMD ROCm
Increase Understanding
Purchase AMD ROCm Text Book
View the latest news in the Blogs
54 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Product Brochures
Whitepapers
Qualified Server
Catalog
Case Studies
Product Videos
AMD ROCm Releases
Tech Docs & FAQs
Compilers, Libraries &
Tools
AMD ROCm Learning Ctr
AMD Meet The Expert
(MTE)
AMD ROCm
AMD Infinity Hub
Application Catalogue
ML Frameworks
Instinct Benchmarks
AMD ROCm
Community
AMD Instinct™ Blogs
AMD ROCm Blogs
AMD Instinct GPU
Newsletter
Collateral
AMD AIER 2.0
Initiative
AMD HPC Fund
AMD ROCm™
Info Portal
Software
Containers &
Install Guides
Community
& News
Funds &
Initiatives
AMD.com/AIER
Community.AMD.com
AMD.com/InfinityHub
DOCs.AMD.com
AMD.com/ROCm
AMD.com/Instinct AMD.com/HPC-FUND
AMD Lab Notes
55 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
TEAMS AVAILABLE TO HELP YOU
AMD MI210 GPUS
AMD Instinct™ MI210 Cards (PCIe®)
available now for POCs
POC ONBOARDING APPLICATION
OPTIMIZATION
POST-DEPLOYMENT
SUPPORT
FULL SYSTEMS + GPUS
Servers with AMD Instinct MI210
available now for POCs
AMD ACCELERATOR CLOUD
(AAC)
Remote access to
MI100, MI210 & MI250
[AMD.com/AAC]
56 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
ACADEMIA &
RESEARCH
• Leadership in perf &
perf/W
• Large memory capacity
enabling more
simulation per day
• Open SW ecosystem
ROCm™ & CUDA to HIP
conversion tool to open
research possibilities
• Benchmarks & Apps:
HPL, HPCG, PIConGPU,
CoMET, MILC
MATERIAL & LIFE
SCIENCE
• High floating-point &
memory capacity
• MI200 series OAM and
PCIe® form factor
support various system
config to right size the
solution
• Catalog of applications
on AMD Infinity Hub
• Applications:
AMBER, NAMD,
LAMMPS, OPENMM,
GROMACS, RELION
• Performance
leadership & broad SW
ecosystem support
• Open & portable
software ecosystem
providing choice to
users
• Large memory capacity
for large model training
• Massive I/O for scale-
out environments
• Deep partnership with
Microsoft Azure
GEOSCIENCE
• Leadership perf &
perf/W leads to
exceptional TCO benefit
• Massive memory size &
bandwidth for large
datasets
• CUDA® to HIP
conversion tool to
simply and quickly port
existing GPU
applications
• Benchmarks & Apps:
Acoustic ISO, AMG,
SpecFem3D
CLOUD
AI/ML
• Joint development with
leading ML frameworks
– PyTorch &
TensorFlow
• Large memory per GPU
to support Training and
large batch Inference
• High bandwidth GPU to
GPU comms to scale
out massively parallel
models
• Models & Benchmarks:
GPT, BERT, Resnet, VGG
57 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
MI250X cabinets MI250X blades
2U & 4U MI250
rack servers
1X MI210 blade (20
GPUs in 8U enclosure)
4U MI210
rack servers (1-8
GPUs)
2U MI210
rack servers (1-4 GPUs)
https://www.amd.com/system/files/documents/amd-instinct-server-soln-catalog.pdf
58 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
•
WHYCLIENTSCHOOSEAMDINSTINCTACCELERATORS
COMPELLING PERFORMANCE AND TCO ADVANTAGES IN HPC & AI
SIMPLIFIED PROGRAMMABILITY AND PORTABILITY
• Highly optimized & innovative compute architecture delivering leadership HPC & AI performance
• Industry-leading HBM capacity and bandwidth … critical to most HPC & AI workloads
• Strong Perf/W and Perf/$ advantage for game-changing TCO
• Low friction portability, simplified programmability & usability to accelerate time to results
• Rich suite of performant, open source, (& equivalent) libraries and SW tools for maximized performance
• Drop-in support for leading AI frameworks and HPC programming models to ease migration
GROWING OPEN SOFTWARE ECOSYSTEM FOR HPC & AI
• Growing repository of ported/optimized applications via deep partnerships with leading HPC & AI clients
• Open-source ROCm software stack & HIP cross platform programming model
• Continued investment & participation in open-source multi-hardware initiatives (OpenMP, OpenXLA, Triton, MLIR)
59 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
60 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Learn more about AMD Instinct™ MI200
Accelerators and ROCm™ 5 Powering Discoveries
at Exascale
AMD.COM/INSTINCT
62 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
AMD Instinct™
MI250 Brochure
AMD Instinct™
MI210 Brochure
AMD CDNA™ 2
White Paper
AMD ROCm™ 5
Brochure
AMD Instinct™ Accelerator
Qualified Servers
GPU Accelerated
Applications Catalog
63 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
NEW
HPC Comes to
Life with
AMD Instinct™
Accelerators and
NAMD
Enhancing
LAMMPS
Simulations
with
AMD Instinct™
Accelerators:
Breaking Barriers
in Plasma Physics
with PIConGPU
and AMD
Instinct™ MI250
GPU
64 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
Accelerated Computing with HIP
by Yifan Sun, Trinayan Baruah, David R Kaeli
Available through Barnes and Noble
ISBN-13: 9798218107444
65 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
DISCLAIMER AND ATTRIBUTIONS
©2023 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Instinct, EPYC, Infinity Fabric, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. PCIe is a registered trademark of PCI-SIG Corporation. OpenCL™ is a registered trademark used under license by Khronos. The OpenMP name
and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board. TensorFlow, the TensorFlow logo and any related marks are trademarks of
Google Inc. PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc. ANSYS, and any and all ANSYS, Inc. brand, product, service and feature names,
logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries. OPENFOAM® is a registered trademark
of OpenCFD Limited, producer and distributor of the OpenFOAM software via www.openfoam.com.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken
in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to
update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or
completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement,
merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described
herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations
applicable to the purchase or use of AMD products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and
Conditions of Sale. GD-18
66 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
ENDNOTES
MI200-01 - World’s fastest data center GPU is the AMD Instinct™ MI250X. Calculations conducted by AMD Performance Labs as of Sep 15, 2021, for the AMD Instinct™ MI250X (128GB HBM2e OAM module) accelerator at 1,700 MHz peak boost
engine clock resulted in 95.7 TFLOPS peak theoretical double precision (FP64 Matrix), 47.9 TFLOPS peak theoretical double precision (FP64), 95.7 TFLOPS peak theoretical single precision matrix (FP32 Matrix), 47.9 TFLOPS peak theoretical single
precision (FP32), 383.0 TFLOPS peak theoretical half precision (FP16), and 383.0 TFLOPS peak theoretical Bfloat16 format precision (BF16) floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the
AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), 46.1 TFLOPS peak theoretical single precision matrix (FP32), 23.1 TFLOPS peak
theoretical single precision (FP32), 184.6 TFLOPS peak theoretical half precision (FP16) floating-point performance. Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS
peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16), 312 TFLOPS peak half precision (FP16 Tensor Flow), 39 TFLOPS
peak Bfloat 16 (BF16), 312 TFLOPS peak Bfloat16 format precision (BF16 Tensor Flow), theoretical floating-point performance. The TF32 data format is not IEEE compliant and not included in this comparison.
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1.
MI200-07 - Calculations conducted by AMD Performance Labs as of Sep 21, 2021, for the AMD Instinct™ MI250X and MI250 (128GB HBM2e) OAM accelerators designed with AMD CDNA™ 2 6nm FinFet process technology at 1,600 MHz peak
memory clock resulted in 128GB HBM2e memory capacity and 3.2768 TFLOPS peak theoretical memory bandwidth performance. MI250/MI250X memory bus interface is 4,096 bits times 2 die and memory data rate is 3.20 Gbps for total
memory bandwidth of 3.2768 TB/s ((3.20 Gbps*(4,096 bits*2))/8). The highest published results on the NVidia Ampere A100 (80GB) SXM GPU accelerator resulted in 80GB HBM2e memory capacity and 2.039 TB/s GPU memory bandwidth
performance. Https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf
MI200-18 - Calculations conducted by AMD Performance Labs as of Sep 21, 2021, for the AMD Instinct™ MI250X and MI250 accelerators (OAM) designed with CDNA™ 2 6nm FinFet process technology at 1,600 MHz peak memory clock resulted
in 128GB HBMe memory capacity. Published specifications on the NVidia Ampere A100 (80GB) SXM and A100 accelerators (PCIe®) showed 80GB memory capacity. Results found at:
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf
MI200-41 - Calculations conducted by AMD Performance Labs as of Jan 14, 2022, for the AMD Instinct™ MI210 (64GB HBM2e PCIe® card) accelerator at 1,700 MHz peak boost engine clock resulted in 45.3 TFLOPS peak theoretical double
precision (FP64 Matrix), 22.6 TFLOPS peak theoretical double precision (FP64), and 181.0 TFLOPS peak theoretical Bfloat16 format precision (BF16), floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18,
2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), and 184.6 TFLOPS peak theoretical half precision (FP16), floating-
point performance. Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS peak double precision
(FP64) and 39 TFLOPS peak Bfloat16 format precision (BF16), theoretical floating-point performance. The TF32 data format is not IEEE compliant and not included in this comparison. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-
Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1. MI200-41
MI200-44 - Calculations conducted by AMD Performance Labs as of Feb 15, 2022, for the AMD Instinct™ MI210 (64GB HBM2e PCIe® card) 300 Watt accelerator at 1,700 MHz peak boost engine clock resulted in 45.3 TFLOPS peak theoretical
double precision (FP64 Matrix), 22.6 TFLOPS peak theoretical double precision (FP64), 22.6 TFLOPS peak theoretical single precision (FP32), and 181.0 TFLOPS peak theoretical Bfloat16 format precision (BF16), floating-point performance.
Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64),
floating-point performance. Published results on the NVidia Ampere A100 (80GB) 300 Watt PCIe® GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS
peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16), 39 TFLOPS peak Bfloat16 format precision (BF16), theoretical floating-point performance.
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1.
MI200-46 – Testing Conducted by AMD performance lab on a 2P socket 3rd Gen AMD EPYC™ ‘7763 CPU Supermicro 4124, with 8x AMD Instinct™ MI210 GPU (PCIe® 64GB 300W) No AMD Infinity Fabric™ technology enabled. OS: Ubuntu 18.04.6
LTS, ROCm 5.0Bench mark: Relion v3.1.2 Converted with HIP with AMD optimizations to Relion that are not yet available upstream. Vs. Nvidia Published Measurements: https://developer.nvidia.com/hpc-application-performance accessed
3/13/2022 Nvidia Container details found at: https://catalog.ngc.nvidia.com/orgs/hpc/containers/relion information on Relion found at: https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install All results measured on
systems with 8 GPUs, with 4GPU XGMI bridges where applicable. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations MI200-46
67 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
ENDNOTES (CONT.)
MI200-47A: - Testing Conducted on 10.03.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled.
SBIOS2.2, Ubuntu® 18.04.6 LTS, host ROCm™ 5.2.0. LAMMPS container amdih-2022.5.04_130 (ROCm 5.1) versus. Nvidia public claims for 4x and 8x A100 PCIe 80GB GPUs http://web.archive.org/web/20220718053400/
https://developer.nvidia.com/hpc-application-performance Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-49A - Testing Conducted on 11.14.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 8x AMD Instinct™ MI210 GPU (PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6
LTS, Host ROCm 5.2.0, rocHPL 6.0.0. Results calculated from medians of five runs Vs. Testing Conducted by AMD performance on 2P socket AMD EPYC™ 7763 Supermicro 4124 with 8x NVIDIA A100 GPU (PCIe 80GB 300W), Ubuntu 18.04.6 LTS,
CUDA 11.6, HPL Nvidia container image 21.4-HPL. All results measured on systems configured with 8 GPUs;2 pairs of 4xMI210 GPUs connected by a 4-way Infinity Fabric™ link bridge. 4 Pairs of 2xA100 80GB PCIe GPUs connected by a 2-way
NVLink bridges. Information on HPL: https://www.netlib.org/benchmark/hpl/. AMD HPL Container Details: HPL Container not yet available on Infinity Hub. Nvidia HPL Container Detail: https://ngc.nvidia.com/catalog/containers/nvidia:hpc-
benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-69A - Testing Conducted by AMD performance lab 11.14.2022 using HPL comparing two systems. 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, host ROCm 5.2.0
rocHPL6.0.0. AMD HPL container is not yet available on Infinity Hub. vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x, 2x, and4x Nvidia Ampere A100 80GB SXM 400W GPUs, CUDA 11.6 and Driver Version 510.47.03. HPL Container
(nvcr.io/nvidia/hpc-benchmarks:21.4-hpl) obtained from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of
latest drivers and optimizations.
MI200-70A Testing Conducted by AMD performance lab 11.2.2022 using HPCG 3.0 comparing two systems: 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, SBIOS M12,
Ubuntu 20.04.4, Host ROCm 5.2.0, AMD HPCG 3.0 container is not yet available on Infinity Hub. vs. 2P AMD EPYC™ 7742 server with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. HPCG 3.0
Container: nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of
latest drivers and optimizations.
MI200-72 - SuperBench v0.6 model training comparison based on AMD internal testing as of 11/09/2022 measuring the total training throughput using a 2P AMD EPYC™ 7763 CPU server tested with 1x AMD Instinct™ MI250 (128GB HBM2e)
560W GPUs with Infinity Fabric™ technology; ROCm™ 5.2, PyTorch 1.12 versus a 2P EPYC™ 7742 CPU server with 1x Nvidia Ampere A100 (80GB HBM2e) 400W SXM GPU, CUDA® 11.6, PyTorch 1.8.0; Server manufacturers may vary
configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations.
MI200-73 - Testing Conducted by AMD performance lab 8.26.22 using AMBER: Cellulose_production_NPT_4fs, Cellulose_production_NVE_4fs, FactorIX_production_NPT_4fs, FactorIX_production_NVE_4fs, STMV_production_NPT_4fs, and
STMV_production_NVE_4fs, JAC_production_NPT_4fs, JAC production_NVE-4fs comparing two systems: 2P EPYC™ 7763 powered server with 1x A MD Instinct™ MI250 (128GB HBM2e) 560W GPU, ROCm 5.2.0, Amber container 22.amd_100 vs.
2P EPYC™ 7742 powered server with 1x Nvidia A100 SXM (80GB HBM2e) 400W GPU, MPS enabled (2 instances), CUDA 11.6. Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including
use of latest drivers and optimizations.
MI200-74 - Testing Conducted by AMD performance lab 10.18.22 using Gromacs: STMV Comparing two systems: 2P EPYC™ 7763 powered server with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU with Infinity Fabric™ technology,
ROCm™ 5.2.0, Gromacs container 2022.3.amd1_174 vs. Nvidia public claims https://developer.nvidia.com/hpc-application-performance. (Gromacs 2022.2). Dual EPYC 7742 with 4x Nvidia Ampere A100 SXM 80GB GPUs. Server manufacturers
may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations.
MI200-75 - Testing Conducted by AMD performance lab 9.13.22 using OpenMM at double precision: gbsa, rf, pme, amoebagk, amoebapme, apoa1rf, apoa1pme, apoa1ljpme at double precision Comparing two systems: 2P EPYC™ 7763 powered
server with 1x AMD Instinct™ MI250 with AMD Infinity Fabric Technology™ (128GB HBM2e) 560W GPU, ROCm 5.2.0, OpenMM container 7.7.0_49 vs. 2P EPYC™ 7742 powered server with 1x Nvidia A100 SXM (80GB HBM2e) 400W GPU, MPS
enabled (2 instances), CUDA 11.6, OpenMM 7.7.0. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-76 - Testing Conducted by AMD performance lab 9.13.22 using NAMD: STMV_NVE, APOA1_NVE Comparing two systems: 2P EPYC™ 7763 powered server with 1x, 2x and 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU with AMD
Infinity Fabric™ technology, ROCm 5.2.0, NAMD container namd3:3.0a9 vs. Nvidia public claims of performance of 2P EPYC 7742 Server with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM GPU https://developer.nvidia.com/hpc-application-
performance. (v2.15a AVX-512). Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
68 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
ENDNOTES (CONT.)
MI200-77 - Testing Conducted by AMD performance lab 10.03.22 using LAMMPS: LJ, ReaxFF and Tersoff Comparing two systems: 2P EPYC™ 7763 powered server with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU, ROCm 5.2.0, LAMMPS
container amdih/lammps:2022.5.04_130 vs. Nvidia public claims http://web.archive.org/web/20220718053400/https://developer.nvidia.com/hpc-application-performance. Server manufacturers may vary configurations, yielding different
results. Performance may vary based on use of latest drivers and optimizations.
MI200-78 - Testing Conducted by AMD performance lab on 9/10/2022 using Quicksilver (https://github.com/LLNL/Quicksilver.git) comparing two systems: 2P socket AMD EPYC™ 7763 CPU Supermicro 4124 with 8x AMD Instinct™ MI210 GPU
(PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6 LTS, Host ROCm 5.2.0 vs. 2P socket AMD EPYC™ 7763 Supermicro 4124 with 8x NVIDIA A100 GPU (PCIe 80GB 300W), Ubuntu 18.04.6 LTS, CUDA 11.6, Server
manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-79 - Testing Conducted by AMD performance lab on 8/12/2022 using AMG Solve comparing two systems: 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs with Infinity
Fabric technology, SBIOS M12, Ubuntu 20.04.4, Host ROCm 5.2.0 vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. Server manufacturers may
vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-80 HuggingFace Transformers training throughput comparison based on AMD internal testing as of 1/17/2023 using a 2P AMD EPYC™ 7763 CPU server tested with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPUs with Infinity
Fabric™ technology; ROCm™ 5.4, PyTorch 1.12.1, base image: rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_1.12.1 versus a 2P EPYC™ 7742 CPU server with 4x Nvidia Ampere A100 (80GB HBM2e) 400W SXM GPU, CUDA® 11.8, PyTorch
1.13.0a0; base image: nvcr.io/nvidia/pytorch:22.11-py3. Server manufacturers may vary configurations, yielding different results. Performance may vary based on factors including use of latest drivers and optimizations.
MI200-81 HPL-AI comparison based on AMD internal testing as of 11/2/2022 measuring the HPL-AI benchmark performance (TFLOPS) using a server with 2x EPYC™ 7763 with 4x MI250 (128MB HBM2e) with Infinity Fabric running host ROCm™
5.2.0, HPL-AI-AMD v1.0.0; AMD HPL-AI container not yet available on Infinity Hub. versus a server with 2x EPYC 7742 with 4x A100 SXM (80GB HBM2e) running CUDA® 11.6, HPL-AI-NVIDIA v2.0.0, container nvcr.io/nvidia/hpc-benchmarks:21.4-
hpl. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-82: "Testing Conducted by AMD performance lab on 11/25/2022 using PyFR TGV and NACA 0021 comparing two systems:2P EPYC™ 7763 powered server, SMT disabled, with 1x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs with
AMD Infinity Guard technology, SBIOS M12, Ubuntu 20.04.4, Host ROCm 5.2.0 vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. Server manufacturers may
vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations."
MI200 -85 Testing Conducted on 10.18.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled.
SBIOS2.2, Ubuntu® 18.04.6 LTS, host ROCm™ 5.2.0. Gromacs container 2022.3.amd1_174 (ROCm 5.3) versus Nvidia public claims for 4x and 8x A100 PCIe 80GB https://developer.nvidia.com/hpc-application-performance. (Gromacs 2022.3).
Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations.
MI200-86: 2P 3rd Gen AMD EPYC™ 7763 CPU server with 1x AMD Instinct™ MI210 (64 GB HBM2e, PCIe®) GPU performance running OpenMM, at double precision, gbsa is ~1.1x (~10%) faster amoebagk is ~1.0x (comparable) amoebapme is ~1.1x
(~10%) faster 2P 3rd Gen AMD EPYC™ 7763 CPU server with 1x Nvidia Ampere A100 80GB PCIe®, 300W. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and
optimizations.
MI200-87: Testing Conducted on 8.12.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU production platform with 1x, 2x, 4x, 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled.
SBIOS2.2, Ubuntu® 18.04.6 LTS, ROCm™ 5.0.0. versus 2P socket AMD EPYC™ 7763 production platform with 1x, 2x, 4x, 8x NVIDIA A100 GPUs (PCIe 80GB 300W) with NVLink technology enabled. SBIOS 2.3a, Ubuntu 18.04.6 LTS, CUDA® 11.6
Update 2. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-88: Testing Conducted by AMD performance lab between 10.03.2022 and 11.14.2022 using HPL, Gromacs STMV and LAMMPS ReaxFF comparing two systems. 2P AMD EPYC™ ‘7763 CPU production platform with 2x, 4x and 8x AMD
Instinct™ MI210 GPU (PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6 LTS, versus 2P AMD EPYC™ 7763 CPU production platform with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, AMD Infinity
Fabric™ technology enabled. Ubuntu 20.04.4 LTS. Both systems compare using Host ROCm 5.2.0, rocHPL6.0.0. (ROCm5.3), Gromacs container 2022.3.amd1_174 (ROCm5.3), LAMMPS container amdih-2022.5.04_130 (ROCm5.1). Server
manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-90: Testing Conducted by AMD performance lab on 04/14/2023 using OpenFOAM v2206 on a 2P EPYC 7763 CPU production server with 1x, 2x, and 4x AMD Instinct™ MI250 GPUs (128GB, 560W) with AMD Infinity Fabric™ technology
enabled, ROCm™5.3.3, Ubuntu® 20.04.4 vs a 2P EPYC 7742 CPU production server with 1x, 2x, and 4x Nvidia A100 80GB SXM GPUs (400W) with NVLink technology enabled, CUDA® 11.8, Ubuntu 20.04.4. Server manufacturers may vary
configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
MI200-91: Testing Conducted by AMD performance lab on 04/14/2023 using OpenFOAM v2206 on a 2P EPYC 7763 CPU production server with 1x, 2x, 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™
technology enabled, ROCm5.3.3, Ubuntu 20.04.4 vs a 2P EPYC 7763 CPU production server with 1x, 2x, 4x and 8x Nvidia A100 80GB PCIe GPUs (300W) with NVLink technology enabled, CUDA 11.8, Ubuntu 20.04.4. Server manufacturers may vary
configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
69 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
ENDNOTES (CONT.)
MI300-003 - Measurements by AMD Performance Labs June 4, 2022 on current specification and/or estimation for estimated delivered FP8 floating point performance with structure sparsity supported for AMD Instinct™ MI300 vs. MI250X FP16
(306.4 estimated delivered TFLOPS based on 80% of peak theoretical floating-point performance). MI300 performance based on preliminary estimates and expectations. Final performance may vary.
MI300-004 – Measurements by AMD Performance Labs June 4, 2022. MI250X (560W) FP16 (306.4 estimated delivered TFLOPS based on 80% of peak theoretical floating-point performance). MI300 FP8 performance based on preliminary
estimates and expectations. MI300 TDP power based on preliminary projections. Actual results based on production silicon may vary.
MI300-005: Calculations conducted by AMD Performance Labs as of May 17, 2023, for the AMD Instinct™ MI300X OAM accelerator 750W (192 GB HBM3) designed with AMD CDNA™ 3 5nm FinFet process technology resulted in 192 GB HBM3
memory capacity and 5.218 TFLOPS sustained peak memory bandwidth performance. MI300X memory bus interface is 8,192 and memory data rate is 5.6 Gbps for total sustained peak memory bandwidth of 5.218 TB/s (8,192 bits memory bus
interface * 5.6 Gbps memory data rate/8)*0.91 delivered adjustment. The highest published results on the NVidia Hopper H100 (80GB) SXM GPU accelerator resulted in 80GB HBM3 memory capacity and 3.35 TB/s GPU memory bandwidth
performance.
MI300-07K:Measurements by internal AMD Performance Labs as of June 2, 2023 on current specifications and/or internal engineering calculations. Large Language Model (LLM) run or calculated with FP16 precision to determine the minimum
number of GPUs needed to run the Falcon (40B parameter) models. Tested result configurations: AMD Lab system consisting of 1x EPYC 9654 (96-core) CPU with 1x AMD Instinct™ MI300X (192GB HBM3, OAM Module) 750W accelerator tested at
FP16 precision. Server manufacturers may vary configuration offerings yielding different results.
MI300-08K - Measurements by internal AMD Performance Labs as of June 2, 2023 on current specifications and/or internal engineering calculations. Large Language Model (LLM) run comparisons with FP16 precision to determine the minimum
number of GPUs needed to run the Falcon (40B parameters); GPT-3 (175 Billion parameters), PaLM 2 (340 Billion parameters); PaLM (540 Billion parameters) models.Calculated estimates based on GPU-only memory size versus memory required
by the model at defined parameters plus 10% overhead.
Calculations rely on published and sometimes preliminary model memory sizes. Tested result configurations: AMD Lab system consisting of 1x EPYC 9654 (96-core) CPU with 1x AMD Instinct™ MI300X (192GB HBM3, OAM Module) 750W
accelerator Vs. Competitve testing done on Cirrascale Cloud Services comparable instance with permission.
Results (FP16 precision):Model: Parameters Tot Mem. Reqd MI300X Reqd Competition Reqd
Falcon-40B 40 Billion 88 GB 1 Actual 2 Actual
GPT-3 175 Billion 385 GB 3 Calculated 5 Calculated
PaLM 2 340 Billion 748 GB 4 Calculated 10 Calculated
PaLM 540 Billion 1188 GB 7 Calculated 15 Calculated
Calculated estimates may vary based on final model size; actual and estimates may vary due to actual overhead required and using system memory beyond that of the GPU. Server manufacturers may vary configuration offerings yielding
different results

More Related Content

What's hot

Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
NTT DATA Technology & Innovation
 
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
Yuta Imai
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
NTT DATA OSS Professional Services
 
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
Yahoo!デベロッパーネットワーク
 
オンプレML基盤on Kubernetes パネルディスカッション
オンプレML基盤on Kubernetes パネルディスカッションオンプレML基盤on Kubernetes パネルディスカッション
オンプレML基盤on Kubernetes パネルディスカッション
Yahoo!デベロッパーネットワーク
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA Japan
 
Edge Computing と k8s でなんか話すよ
Edge Computing と k8s でなんか話すよEdge Computing と k8s でなんか話すよ
Edge Computing と k8s でなんか話すよ
VirtualTech Japan Inc.
 
3分でわかる Azure Managed Diskのしくみ
3分でわかる Azure Managed Diskのしくみ3分でわかる Azure Managed Diskのしくみ
3分でわかる Azure Managed Diskのしくみ
Toru Makabe
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
Shuji Kikuchi
 
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
AWS Korea 금융산업팀
 
LakeTahoe
LakeTahoeLakeTahoe
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs CloudMiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
Clément DUFFAU
 
Linuxのユーザーランドをinitから全てまるごとgolangで書く
Linuxのユーザーランドをinitから全てまるごとgolangで書くLinuxのユーザーランドをinitから全てまるごとgolangで書く
Linuxのユーザーランドをinitから全てまるごとgolangで書く
Tetsuyuki Kobayashi
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
Kohei Tokunaga
 
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Motonori Shindo
 
第4回Linux-HA勉強会資料 Pacemakerの紹介
第4回Linux-HA勉強会資料 Pacemakerの紹介第4回Linux-HA勉強会資料 Pacemakerの紹介
第4回Linux-HA勉強会資料 Pacemakerの紹介ksk_ha
 
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
NTT DATA Technology & Innovation
 
Ingressの概要とLoadBalancerとの比較
Ingressの概要とLoadBalancerとの比較Ingressの概要とLoadBalancerとの比較
Ingressの概要とLoadBalancerとの比較
Mei Nakamura
 

What's hot (20)

Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
 
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
 
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
"Yahoo! JAPAN の Kubernetes-as-a-Service" で加速するアプリケーション開発
 
オンプレML基盤on Kubernetes パネルディスカッション
オンプレML基盤on Kubernetes パネルディスカッションオンプレML基盤on Kubernetes パネルディスカッション
オンプレML基盤on Kubernetes パネルディスカッション
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
 
Edge Computing と k8s でなんか話すよ
Edge Computing と k8s でなんか話すよEdge Computing と k8s でなんか話すよ
Edge Computing と k8s でなんか話すよ
 
3分でわかる Azure Managed Diskのしくみ
3分でわかる Azure Managed Diskのしくみ3分でわかる Azure Managed Diskのしくみ
3分でわかる Azure Managed Diskのしくみ
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
 
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
[보험사를 위한 AWS Data Analytics Day] 1_데이터 경영으로 보험 산업의 ...
 
LakeTahoe
LakeTahoeLakeTahoe
LakeTahoe
 
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs CloudMiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
MiXiT - Numérique responsable, ouvrons le capot des fournisseurs Cloud
 
Linuxのユーザーランドをinitから全てまるごとgolangで書く
Linuxのユーザーランドをinitから全てまるごとgolangで書くLinuxのユーザーランドをinitから全てまるごとgolangで書く
Linuxのユーザーランドをinitから全てまるごとgolangで書く
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用
 
第4回Linux-HA勉強会資料 Pacemakerの紹介
第4回Linux-HA勉強会資料 Pacemakerの紹介第4回Linux-HA勉強会資料 Pacemakerの紹介
第4回Linux-HA勉強会資料 Pacemakerの紹介
 
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
PGOを用いたPostgreSQL on Kubernetes入門(PostgreSQL Conference Japan 2022 発表資料)
 
Ingressの概要とLoadBalancerとの比較
Ingressの概要とLoadBalancerとの比較Ingressの概要とLoadBalancerとの比較
Ingressの概要とLoadBalancerとの比較
 

Similar to PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」

ISSCC "Carrizo"
ISSCC "Carrizo"ISSCC "Carrizo"
ISSCC "Carrizo"
AMD
 
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Rebekah Rodriguez
 
7th Generation AMD PRO APUs Press Deck
7th Generation AMD PRO APUs Press Deck7th Generation AMD PRO APUs Press Deck
7th Generation AMD PRO APUs Press Deck
Kok Kee
 
Seyer June06 Analyst Day
Seyer June06 Analyst DaySeyer June06 Analyst Day
Seyer June06 Analyst Day
lalowder
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
Jason Kyungho Lee
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Rebekah Rodriguez
 
Embedded Platforms Launch Press Presentation
Embedded Platforms Launch Press PresentationEmbedded Platforms Launch Press Presentation
Embedded Platforms Launch Press Presentation
AMD
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Diego Alberto Tamayo
 
Marv Wexler - Transform Your with AI.pdf
Marv Wexler - Transform Your with AI.pdfMarv Wexler - Transform Your with AI.pdf
Marv Wexler - Transform Your with AI.pdf
SOLTUIONSpeople, THINKubators, THINKathons
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Rebekah Rodriguez
 
AMD Ryzen Pro
AMD Ryzen ProAMD Ryzen Pro
AMD Ryzen Pro
Low Hong Chuan
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013
Cliff Kinard
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
Sashikris
 
Delivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYCDelivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYC
Rebekah Rodriguez
 
HP Discover
HP DiscoverHP Discover
HP Discover
AMD
 
Delivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYCDelivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYC
Rebekah Rodriguez
 
Where will the next 80% improvement in data center performance come from?
Where will the next 80% improvement in data center performance come from?Where will the next 80% improvement in data center performance come from?
Where will the next 80% improvement in data center performance come from?
Schneider Electric
 
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Toshinori Kujiraoka
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 

Similar to PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」 (20)

ISSCC "Carrizo"
ISSCC "Carrizo"ISSCC "Carrizo"
ISSCC "Carrizo"
 
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
 
7th Generation AMD PRO APUs Press Deck
7th Generation AMD PRO APUs Press Deck7th Generation AMD PRO APUs Press Deck
7th Generation AMD PRO APUs Press Deck
 
Seyer June06 Analyst Day
Seyer June06 Analyst DaySeyer June06 Analyst Day
Seyer June06 Analyst Day
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
Embedded Platforms Launch Press Presentation
Embedded Platforms Launch Press PresentationEmbedded Platforms Launch Press Presentation
Embedded Platforms Launch Press Presentation
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Marv Wexler - Transform Your with AI.pdf
Marv Wexler - Transform Your with AI.pdfMarv Wexler - Transform Your with AI.pdf
Marv Wexler - Transform Your with AI.pdf
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
AMD Ryzen Pro
AMD Ryzen ProAMD Ryzen Pro
AMD Ryzen Pro
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
Delivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYCDelivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYC
 
HP Discover
HP DiscoverHP Discover
HP Discover
 
Delivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYCDelivering Breakthrough Performance Per Core with AMD EPYC
Delivering Breakthrough Performance Per Core with AMD EPYC
 
Where will the next 80% improvement in data center performance come from?
Where will the next 80% improvement in data center performance come from?Where will the next 80% improvement in data center performance come from?
Where will the next 80% improvement in data center performance come from?
 
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
Arm Neoverse solutions @Graviton2-AWS Japan Webinar Oct2020
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 

More from PC Cluster Consortium

PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PC Cluster Consortium
 
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PC Cluster Consortium
 
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PC Cluster Consortium
 
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PC Cluster Consortium
 
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PC Cluster Consortium
 
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PC Cluster Consortium
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PC Cluster Consortium
 
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
PC Cluster Consortium
 
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
PC Cluster Consortium
 
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
PC Cluster Consortium
 
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PC Cluster Consortium
 
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
PC Cluster Consortium
 
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
PC Cluster Consortium
 
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
PC Cluster Consortium
 
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
PC Cluster Consortium
 
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
PC Cluster Consortium
 
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
PC Cluster Consortium
 
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
PC Cluster Consortium
 
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
PC Cluster Consortium
 
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
PC Cluster Consortium
 

More from PC Cluster Consortium (20)

PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
 
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
 
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
 
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
 
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
 
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
PCCC23:Pacific Teck Japan テーマ1「データがデータを生む時代に即したストレージソリューション」
 
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
PCCC23:株式会社計算科学 テーマ1「VRシミュレーションシステム」
 
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
PCCC22:株式会社アックス テーマ1「俺ASICとロボットと論理推論AI」
 
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
 
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
PCCC22:富士通株式会社 テーマ3「量子シミュレータ」
 
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
PCCC22:富士通株式会社 テーマ1「Fujitsu Computing as a Service (CaaS)」
 
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
PCCC22:日本電気株式会社 テーマ1「AI/ビッグデータ分析に最適なプラットフォーム NECのベクトルプロセッサ『SX-Aurora TSUBASA』」
 
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
PCCC22:東京大学情報基盤センター 「Society5.0の実現を目指す「計算・データ・学習」の融合による革新的スーパーコンピューティング」
 
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」03
 
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
PCCC22:日本マイクロソフト株式会社 テーマ2「HPC on Azureのお客様事例」01
 
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
PCCC22:日本マイクロソフト株式会社 テーマ1「HPC on Microsoft Azure」
 
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
PCCC22:インテル株式会社 テーマ3「インテル® oneAPI ツールキット 最新情報のご紹介」
 
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
PCCC22:インテル株式会社 テーマ2「次世代インテル® Xeon™ プロセッサーを中心としたインテルのHPC-AI最新情報」
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」

  • 1. 1 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Presenter Name: Date: Presented to:
  • 2. 2 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features, functionality, performance, availability, timing and expected benefits of AMD Instinct™ accelerators; AMD Instinct™ MI300A accelerators; AMD Instinct™ MI300X accelerators; AMD CDNA™ 3 processors; El Capitan; expected upstreaming backend support for AMD GPUs with Open AI; AMD GPU architecture roadmaps; and expected AMD data center energy efficiencies, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward- looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except as may be required by law.​ AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except as may be required by law.
  • 3. 3 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC SamsungExynos MagicLeap Tesla PlayStation5 XboxSeriesX|S SteamDeck Radeon™RX7000Series Radeon™W7000Series Ryzen™7000 Series AMDInstinct™MI250 AMDInstinct™MI210 Radeon™PROV620 AWS, MicrosoftAzure Frontier(US) LUMI(Finland/EU) Adastra(France) Setonix(Australia)
  • 4. 4 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC AGENCIES AND ACADEMIC INSTITUTIONS must speed achievement of better lives, economies, communities THE ENTERPRISE must leverage HPC and AI innovations quickly and profitably RESEARCHERS must solve the greatest problems in history faster
  • 5. 5 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Power needs increase exponentially with scale Demand is high for speedy outcomes Architectures are rigid • In 2008, DARPA set an exascale power goal of 20MW by 2015.1 That target has never been reached. • OpenAI’s GPT-3 natural-language processing model can produce 552 metric tons of carbon emissions2 • The growing volume of data is driving 5.6% yearly HPC growth across sectors through 20273 • AI market growth will average 33.6% 2021–20284 • HPC systems’ typical lifecycle: 3–5 years5 • Firms who contribute to open source get up to 100% more value from its use than those who don’t6 1 HPCwire, July 14, 2021; 2 Patterson, David and Gonzalez, Joseph and Le, Quoc and Liang, Chen and Munguia, Lluis-Miquel and Rothchild, Daniel and So, David and Texier, Maud and Dean, Jeff, “Carbon Emissions and Large Neural Network Training” (Figure 4), arXiv, 2021, ©arXiv.org perpetual, non-exclusive license; 3 Blue Weave Consulting Industry Trends & Forecast Report, Global High-Performance Computing (HPC) Market, Oct 2021; 4 Fortune Business Insights Industry Report, Artificial Intelligence, Sept 2021; 5 AWS White Paper, “Challenging Barriers to HPC in the Cloud,” Oct 2019; 6 Nagle, Frank (2018), “Learning by Contributing: Gaining Competitive Advantage Through Contribution to Crowdsourced Public Goods,” Organization Science 29(4):569-587
  • 6. 6 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC EFFICIENCY PLATFORM FLEXIBILITY VERSATILE PERFORMANCE CODE READINESS Maximize performance per watt to ease power limitations Adapt your application set to your goals without having to reinvest Host a variety of workloads simply within the same architecture Maintain device-level optimization even as your systems evolve
  • 7. 7 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC … you could redefine your potential for fast and scalable results with breakthrough performance in HPC and AI? … you had a wide-ranging ecosystem of open software to tap so you don’t always have to code down to the subcomponent? … your systems could scale and support many kinds of workloads with streamlined programmability between the CPU and GPU?
  • 8. 8 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC 8 CSC
  • 9. 9 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC • First to break Exascale barrier • 7 of top 10 most efficient systems rely on AMD • 9.95 Eflops on HPL-MxP Mixed-Precision Benchmark TOP500, Green500, and HPL-AI lists, as of May 30,2022 Source: TOP500, Green500 and HPL-MxP lists, as of June 2023
  • 10. 10 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Source: TOP500, Green500, and HPL-MxP lists, as of June 2023
  • 11. 11 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Roadmaps Subject to Change
  • 12. 12 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
  • 13. 13 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC SUPERCOMPUTING Exascale proven with leadership scaling, performance, and efficiency in the market.*  Leadership peak performance and performance per Watt  Memory Coherency, High Bandwidth Memory, Low Latency GPU to GPU and CPU to GPU interconnect  Dedicated Software Team for joint application porting, optimization and scaling • AMD Instinct MI250 vs Nvidia A100 SXM – See Endnotes MI200-01, MI200-18 COMMERCIAL Attractive performance/$ across a broad range of systems and applications  Broad portfolio of servers and configurations from leading OEM/ODM  Exceptional HPL Performance with MI250 and Ease Of Deployment with MI210  Turnkey software setup for key HPC applications, ISV Software and AI Frameworks AI / ML Outstanding solution for large ML model training with MI250 and MI210 for Inferencing  Leading FP16 performance*  Highest memory density* to support large ML models  Ease of transition on leading ML frameworks like PyTorch and TensorFlow MI250 MI210 MI250X MI250 MI210
  • 14. 14 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
  • 15. 15 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC FP64 VECTOR FP32 VECTOR FP64 MATRIX FP32 MATRIX FP16 MATRIX MEMORY SIZE PEAK MEMORY BANDWIDTH NOTE: THE A100 TF32 DATA FORMAT IS NOT IEEE FP32 COMPLIANT , SO NOT INCLUDED IN THIS COMPARISON. SEE ENDNOTES: MI200-01, MI200-07
  • 16. 16 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC CSC GROMACS OpenMM LAMMPS NAMD AMBER Relion Hoomd-Blue MILC NWChem OpenFOAM PETSc PyTorch TensorFlow ONNX JAX Ansys Mechanical Cascade Charles Altair AcuSolve Tempoquest WRF/AceCast 2022 2023 VTKm HACC AMG Solve BDAS Quantum Espresso VASP QMCPack Quicksilver Nekbone PENNANT Kripke Ansys Fluent Dassault XFlow Dassault CST STAR-CCM+ Devito Pro A100 SXM (80GB) VS MI250 OAM (128GB) LAMMPS (Reasxff, 4GPU), GROMACS (STMV, 4GPU), OPENMM (DP AMOEBAPME, 1GPU), NAMD3 (STMV NVE, 4GPU), AMBER (CELLULOSE NVE, 1GPU), AMG SOLVE (4GPU), OPENFOAM (HPC MOTORBIKE, 4GPU), PYFR (NACA0021, 1GPU), HPCG (4GPU), HPL (4GPU) See Endnotes MI200-69A, MI200-70A, MI200-73, MI200-74, MI200-75, MI200-76, MI200-77, MI200-79, MI200-82, MI200-90 ~1.8x ~1.5x ~1.8x ~1.2x ~1.3x ~1.9x ~1.9x ~1.7x ~2.8x ~2.9x
  • 17. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC JAX | DEEPSPEED | CUPY MICROSOFT SUPERBENCH SINGLE GPU MODEL TRAINING MULTI-GPU (4 GPU) MODEL TRAINING See Endnotes MI200-72, MI200-80 (400M) (336M) (66M) (355M) (560M) (124M) (1.3B) (770M)
  • 18. 18 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC AMD DATACENTER ENERGY EFFICIENCY 2020-2025: AMD has a goal to deliver 30X increase in energy efficiency for AMD processors and accelerators in HPC and AI training. AMD Instinct accelerators power the #2 through #7 Green 500 systems.* See Endnotes MI200-69A, MI200-81 *Learn More: www.amd.com/en/corporate-responsibility/data-center-sustainability
  • 19. 19 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
  • 20. 20 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC FP64 FP32 FP64 (TENSOR vs. MATRIX) FP32 MATRIX *THE A100 TF32 DATA FORMAT IS NOT IEEE FP32 COMPLIANT , SO NOT INCLUDED IN THIS COMPARISON. 65 32 65 151 75 75 FP64 Tensor FP64 Matrix FP64 FP32* FP32 Leadership Efficiency (GFLOPS/Watt) Leadership HPC Performance (Peak TFLOPS) ACCELERATOR SEE ENDNOTES: MI200-41, MI200-44
  • 21. 21 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ~1.6x ~1.0x ~1.1x ~1.3x ~1.0x ~1.0x 1.6x LAMMPS Gromacs OpenMM Quicksilver AMG Solve OpenFOAM HPL A100 PCIe® (80GB) VS MI210 PCIe® (64GB) LAMMPS (Reasxff, 8GPU), GROMACS (STMV, 8GPU), OPENMM (DP AMOEBAPME, 1GPU), AMG SOLVE (FOM_SOLV/SEC, 8GPU), QUICKSILVER (2GPU), HPL (8GPU), OpenFOAM (HPC motorbike, 8GPU) See Endnotes MI200-47A, MI200-85, MMI200-86, MI200-78, MI200-87, MI200-49A, MI200-91
  • 22. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ~34.2 ~61.8 ~89.3 ~36.1 ~64.7 ~82.7 Gromacs STMV ~6.8E+06 ~1.4E+07 ~2.6E+07 ~7.0E+06 ~1.4E+07 ~2.7E+07 LAMMPS ReaxFF ~40.5 ~80.7 ~162.0 ~40.9 ~81.1 ~159.7 HPL ENTERPRISE SOLUTION WHEN SPACE AND POWER ARE AT A PREMIUM See Endnotes MI200-88 MI250 LAMMPS ReaxFF 1GPU ~6791085.2, 2GPU ~13545169.9, 4GPU ~26329097.1 MI210 LAMMPS ReaxFF 2GPU ~7042464, 4GPU ~13795259.8, 8GPU ~27232174.1
  • 23. 23 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Roadmaps Subject to Change Sampling
  • 24. 24 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Next-gen AI accelerator architecture 3D packaging with 4th Gen AMD Infinity architecture Optimized for performance and power efficiency Dedicated accelerator engines for AI and HPC
  • 25. 25 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC World’sfirstAPUacceleratorforAIandHPC AMD Instinct™ MI300A Next-Gen Accelerator Architecture 128GB HBM3 5nm and 6nm Process Technology SharedMemory CPU + GPU 24 CPU Cores
  • 26. 26 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC TECHNOLOGY PERFORMANCE EFFICENCY ▪ Worlds first data center APU marrying the best of EPYC™ CPU and AMD Instinct™ GPU ▪ Groundbreaking design and 3D packaging designed for leadership memory capacity, bandwidth and reduced application latency ▪ Open, Robust, and Scalable Software with AMD ROCm™ ▪ Outstanding TCO, excellent HPC performance, with ease of programming ▪ CPU-GPU unified memory with 128GB HBM3 for optimized performance and streamlined programmability ▪ Unparalleled Compute Power over prior generation with estimated 8X AI Perf vs MI250X1 ▪ APU Architecture Designed for Power Savings Compared to Discrete Implementation ▪ Unified Efficiency with “Zen 4” and CDNA™ 3 for fine grained power sharing for improved efficiency ▪ Estimating a 5X AI Performance per watt improvement vs prior generation MI250X2. See Endnote 1MI300-03 and 2MI300-04. Preliminary data and projections, subject to change.
  • 27. 27 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC • CPU and GPU cores share a unified on-package pool of memory • CPU | GPU | Cache | HBM • • See Endnote MI300-03. Preliminary data and projections, subject to change.
  • 28. 28 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ▪ ▪ ▪
  • 29. 30 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Expected Late 2023 Content provided by LLNL
  • 30. 31 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC LeadershipgenerativeAIaccelerator AMD Instinct™ MI300X Introducingtoday 192GB HBM3 5.2TB/s Memory Bandwidth 896GB/s Infinity Fabric™ Bandwidth 153B Transistors
  • 31. 32 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC LeadershipgenerativeAIaccelerator AMD Instinct™ MI300X Introducingtoday HBM bandwidth 1.6x Upto compared to Nvidia H100 HBM density 2.4x Upto compared to Nvidia H100 See Endnotes: MI300-05
  • 32. 33 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC 40B 540B Number of GPUs 175B 340B PaLM2 Falcon GPT-3 PaLM Number of parameters Competition 80GB AMD Instinct™ MI300X 192GB AMDInstinct™MI300X Inferenceadvantage See Endnote: MI300-08
  • 33. 34 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Enabling theecosystem requires industry-standard, easily deployed solutions
  • 34. 35 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC AMD Instinct™ Platform Introducing today 1.5 TB HBM3Memory Industry-StandardDesign 8x MI300X
  • 35. 36 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Sampling now MI300A Sampling Q3 MI300X
  • 36. 37 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC • FeaturescomparablewithNvidiaon widelyusedML/AI frameworks • SuiteofoptimizedAIbenchmarksavailabletoshowcaseperformance StartNowwithDrop-inreplacementfortopmachineleaningframeworks, applications and benchmarks • Exascale Proven – Performance, Scalable and Reliability • World-class application team to develop, port and optimize apps • Turnkey Software: key HPC open-source & ISV applications, leading AI frameworks • Robust & performant: software stack for HPC & AI
  • 37. 38 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
  • 38. 39 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC • • • • • • X
  • 39. 40 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Broad portfolio of training and inference compute engines Deep ecosystem of AI partners and co-innovation Open and proven software capabilities
  • 40. 41 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Optimized AI software stack AIModelsand Algorithms Libraries CompilersandTools Runtime AMDInstinct™ GPU AIEcosystemoptimizedforAMD Leadershipperformance Aprovensoftwarestack Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
  • 41. 42 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC BroadAI operator support AMD ROCm™ 5: comprehensive suite of data center optimizations Out-of-the-box support for leading models and frameworks Creating an open and portable AI ecosystem Continuous framework integration Unlocking the AMD Instinct™ accelerator advantage and more… Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
  • 42. 43 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Source Container PIP Wheel TensorFlow GitHub Infinity Hub pypi.org PyTorch GitHub Infinity Hub pytorch.org ONNX-RT GitHub Docker Instructions onnxruntime.ai JAX GitHub public fork Docker Hub Est Q2 2023 DeepSpeed DeepSpeed GitHub Docker Hub deepspeed.ai CuPy cupy.dev Docker Hub cupy.dev TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc. PYTORCH FOUNDATION FOUNDING MEMBER
  • 43. 44 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC T5 NLP with 11B parameters 1st Korean LLM #11 Explorer WUS3 running AI and HPC workloads Largest Finnish language model (TurkuNLP-13B) Allen Institute scientific LLM #3 LUMI National Cancer Institute and DOE accelerating cancer research and treatment #1 Frontier PoweringdatacenterAIatscale Use of third party marks/logos/products is for informational purposes only and no endorsement of or by AMD is intended or implied GD-83
  • 44. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Kevin Scott Soumith Chintala Andrew Ng Note: Highlights added by AMD ClemDelangue
  • 45. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Open Proven Ready software approach AI capabilities support for AI models
  • 46. 47 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Life Science Physics Chemistry CFD Earth Science AMBER GROMACS NAMD LAMMPS Hoomd-Blue VASP MILC GRID QUANTUM ESPRESSO N-Body CHROMA PIConGPU QuickSilver CP2K QUDA NWCHEM TERACHEM QMCPACK OpenFOAM AMR-WIND NEKBONE LAGHOS NEKO NEKRS PeleC EXAGO DEVITO OCCA SPECFEM3D-GLOBE SPECFEM3D-CARTESIAN ACECAST (WRF) MPAS ICON Benchmarks HPL HPCG AMG ML - TORCHBENCH ML - SUPERBENCH Libraries AMR-EX Ginkko HYPRE TRILINOS ML Frameworks PYTORCH TENSORFLOW JAX ONNX OPENAI TRITON ISV Applications ANSYS MECHANICAL CADENCE CHARLES ANSYS FLUENT* SIEMENS STAR-CCM+* SIEMENS CALIBRE* * Porting/optimization in progress + MANY MORE STRONG MOMENTUM AND INCREASING LIST OF SUPORTED APPLICATION, LIBRARIES & FRAMEWORKS
  • 47. 48 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC WORLD-LEADING PROVIDER OF ADVANCED SIMULATION & MODELING SOFTWARE LEADING PROVIDER OF ACCELERATED MICROSCALE WEATHER FORECAST SOFTWARE Devito AUTOMATIC CODE GENERATION EXECUTES OPTIMIZED ACCELERATED SEISMIC INVERSION WORKLOADS HIGH-FIDELITY CFD SOFTWARE BUILT TO TACKLE ENGINEERING FLOW PROBLEMS LEADING PROVIDER OF SCIENCE-BASED MULTISCALE, MULTIPHYSICSMODELING AND SIMULATION APPLICATIONS OPTIMIZED FOR HIGH-PERFORMANCE GPU COMPUTING
  • 48. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC CODE CONVERSION TOOLS EXTEND YOURAPPLICATION PLATFORM SUPPORTBY CONVERTING CUDA® CODE SINGLE SOURCE MAINTAIN PORTABILITY MAINTAIN PERFORMANCE Hipify-perl ◢ Easy to use; point at a directory and it will hipify CUDA code ◢ Very simple string replacement technique; may require manual post-processing ◢ Recommended for quick scans of projects Hipify-clang ◢ More robust translation of the code ◢ Generates warnings and assistance for additional analysis ◢ High quality translation, particularly for cases where the user is familiar with the make system
  • 49. 50 | AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC FEATURE PARITY WITH COMMONLY USED MATH AND COMMUNICATION LIBRARIES NVIDIA AMD Math Libraries cuBLAS rocBLAS cuBLASLt hipBLASLt cuFFT rocFFT cuSOLVER rocSOLVER cuSPARSE rocSPARSE cuRAND rocRAND Communication Library NCCL RCCL C++ Core library Thrust rocThrust CUB hipCUB Wave Matrix Multiply Accumulate Library WMMA rocWMMA Deep Learning / Machine Learning primitives cuDNN MIOpen C++ templates abstraction for GEMMs CUTLASS Composable Kernel AMD DEVELOPS EQUIVALENT OPTIMIZED LIBRARIES TO MAXIMIZE PERFORMANCE OF COMMONLY USED HPC AND AI FUNCTIONS
  • 50. 51 | AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC LOW FRICTION SOFTWARE PORTING FOR EXISTING NVIDIA USERS TO AMD ML Frameworks Python ML Kernel Development C++, Triton IR ML Libraries C++ DROP IN Out-of-the-box Support PORT/OPTIMIZE → NVCC → → HIPIFY → HIPCC → → CUDA Triton Backend → → ROCM Triton Backend → MIRROR Equivalent Libraries rocBLAS, rocSparse, rocFFT, RCCL, MIOpen… cuBLAS, cuSparse, cuFFT, NCCL, cuDNN… DROP-IN CUDA CUSTOM KERNELS CUDA CUSTOM KERNELS TRITON KERNELS TRITON KERNELS
  • 51. 52 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC AMD Instinct™ MI200 SUPPORT 29 key applications & frameworks on Infinity Hub & a catalogue supporting over 90 applications, frameworks & tools PERFORMANCE RESULTS Published Performance Results for Select Apps / Benchmarks Accelerating Instinct™ accelerator adoption Over 17,000 application pulls. 10,000+ since last year AMD.com/InfinityHub Instinct Application Catalog
  • 52. AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Engage with AMD ROCm™ Platform Experts Participate in AMD ROCm Webinar Series Post questions, view FAQ’s in Community Forum Get Started Using AMD ROCm AMD ROCm Documentation on GitHub Download the Latest Version of AMD ROCm Increase Understanding Purchase AMD ROCm Text Book View the latest news in the Blogs
  • 53. 54 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Product Brochures Whitepapers Qualified Server Catalog Case Studies Product Videos AMD ROCm Releases Tech Docs & FAQs Compilers, Libraries & Tools AMD ROCm Learning Ctr AMD Meet The Expert (MTE) AMD ROCm AMD Infinity Hub Application Catalogue ML Frameworks Instinct Benchmarks AMD ROCm Community AMD Instinct™ Blogs AMD ROCm Blogs AMD Instinct GPU Newsletter Collateral AMD AIER 2.0 Initiative AMD HPC Fund AMD ROCm™ Info Portal Software Containers & Install Guides Community & News Funds & Initiatives AMD.com/AIER Community.AMD.com AMD.com/InfinityHub DOCs.AMD.com AMD.com/ROCm AMD.com/Instinct AMD.com/HPC-FUND AMD Lab Notes
  • 54. 55 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC TEAMS AVAILABLE TO HELP YOU AMD MI210 GPUS AMD Instinct™ MI210 Cards (PCIe®) available now for POCs POC ONBOARDING APPLICATION OPTIMIZATION POST-DEPLOYMENT SUPPORT FULL SYSTEMS + GPUS Servers with AMD Instinct MI210 available now for POCs AMD ACCELERATOR CLOUD (AAC) Remote access to MI100, MI210 & MI250 [AMD.com/AAC]
  • 55. 56 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ACADEMIA & RESEARCH • Leadership in perf & perf/W • Large memory capacity enabling more simulation per day • Open SW ecosystem ROCm™ & CUDA to HIP conversion tool to open research possibilities • Benchmarks & Apps: HPL, HPCG, PIConGPU, CoMET, MILC MATERIAL & LIFE SCIENCE • High floating-point & memory capacity • MI200 series OAM and PCIe® form factor support various system config to right size the solution • Catalog of applications on AMD Infinity Hub • Applications: AMBER, NAMD, LAMMPS, OPENMM, GROMACS, RELION • Performance leadership & broad SW ecosystem support • Open & portable software ecosystem providing choice to users • Large memory capacity for large model training • Massive I/O for scale- out environments • Deep partnership with Microsoft Azure GEOSCIENCE • Leadership perf & perf/W leads to exceptional TCO benefit • Massive memory size & bandwidth for large datasets • CUDA® to HIP conversion tool to simply and quickly port existing GPU applications • Benchmarks & Apps: Acoustic ISO, AMG, SpecFem3D CLOUD AI/ML • Joint development with leading ML frameworks – PyTorch & TensorFlow • Large memory per GPU to support Training and large batch Inference • High bandwidth GPU to GPU comms to scale out massively parallel models • Models & Benchmarks: GPT, BERT, Resnet, VGG
  • 56. 57 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC MI250X cabinets MI250X blades 2U & 4U MI250 rack servers 1X MI210 blade (20 GPUs in 8U enclosure) 4U MI210 rack servers (1-8 GPUs) 2U MI210 rack servers (1-4 GPUs) https://www.amd.com/system/files/documents/amd-instinct-server-soln-catalog.pdf
  • 57. 58 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC • WHYCLIENTSCHOOSEAMDINSTINCTACCELERATORS COMPELLING PERFORMANCE AND TCO ADVANTAGES IN HPC & AI SIMPLIFIED PROGRAMMABILITY AND PORTABILITY • Highly optimized & innovative compute architecture delivering leadership HPC & AI performance • Industry-leading HBM capacity and bandwidth … critical to most HPC & AI workloads • Strong Perf/W and Perf/$ advantage for game-changing TCO • Low friction portability, simplified programmability & usability to accelerate time to results • Rich suite of performant, open source, (& equivalent) libraries and SW tools for maximized performance • Drop-in support for leading AI frameworks and HPC programming models to ease migration GROWING OPEN SOFTWARE ECOSYSTEM FOR HPC & AI • Growing repository of ported/optimized applications via deep partnerships with leading HPC & AI clients • Open-source ROCm software stack & HIP cross platform programming model • Continued investment & participation in open-source multi-hardware initiatives (OpenMP, OpenXLA, Triton, MLIR)
  • 58. 59 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC
  • 59. 60 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Learn more about AMD Instinct™ MI200 Accelerators and ROCm™ 5 Powering Discoveries at Exascale AMD.COM/INSTINCT
  • 60.
  • 61. 62 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC AMD Instinct™ MI250 Brochure AMD Instinct™ MI210 Brochure AMD CDNA™ 2 White Paper AMD ROCm™ 5 Brochure AMD Instinct™ Accelerator Qualified Servers GPU Accelerated Applications Catalog
  • 62. 63 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC NEW HPC Comes to Life with AMD Instinct™ Accelerators and NAMD Enhancing LAMMPS Simulations with AMD Instinct™ Accelerators: Breaking Barriers in Plasma Physics with PIConGPU and AMD Instinct™ MI250 GPU
  • 63. 64 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC Accelerated Computing with HIP by Yifan Sun, Trinayan Baruah, David R Kaeli Available through Barnes and Noble ISBN-13: 9798218107444
  • 64. 65 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC DISCLAIMER AND ATTRIBUTIONS ©2023 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Instinct, EPYC, Infinity Fabric, ROCm, and combinations thereof are trademarks of Advanced Micro Devices, Inc. PCIe is a registered trademark of PCI-SIG Corporation. OpenCL™ is a registered trademark used under license by Khronos. The OpenMP name and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc. ANSYS, and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries. OPENFOAM® is a registered trademark of OpenCFD Limited, producer and distributor of the OpenFOAM software via www.openfoam.com. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
  • 65. 66 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ENDNOTES MI200-01 - World’s fastest data center GPU is the AMD Instinct™ MI250X. Calculations conducted by AMD Performance Labs as of Sep 15, 2021, for the AMD Instinct™ MI250X (128GB HBM2e OAM module) accelerator at 1,700 MHz peak boost engine clock resulted in 95.7 TFLOPS peak theoretical double precision (FP64 Matrix), 47.9 TFLOPS peak theoretical double precision (FP64), 95.7 TFLOPS peak theoretical single precision matrix (FP32 Matrix), 47.9 TFLOPS peak theoretical single precision (FP32), 383.0 TFLOPS peak theoretical half precision (FP16), and 383.0 TFLOPS peak theoretical Bfloat16 format precision (BF16) floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), 46.1 TFLOPS peak theoretical single precision matrix (FP32), 23.1 TFLOPS peak theoretical single precision (FP32), 184.6 TFLOPS peak theoretical half precision (FP16) floating-point performance. Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16), 312 TFLOPS peak half precision (FP16 Tensor Flow), 39 TFLOPS peak Bfloat 16 (BF16), 312 TFLOPS peak Bfloat16 format precision (BF16 Tensor Flow), theoretical floating-point performance. The TF32 data format is not IEEE compliant and not included in this comparison. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1. MI200-07 - Calculations conducted by AMD Performance Labs as of Sep 21, 2021, for the AMD Instinct™ MI250X and MI250 (128GB HBM2e) OAM accelerators designed with AMD CDNA™ 2 6nm FinFet process technology at 1,600 MHz peak memory clock resulted in 128GB HBM2e memory capacity and 3.2768 TFLOPS peak theoretical memory bandwidth performance. MI250/MI250X memory bus interface is 4,096 bits times 2 die and memory data rate is 3.20 Gbps for total memory bandwidth of 3.2768 TB/s ((3.20 Gbps*(4,096 bits*2))/8). The highest published results on the NVidia Ampere A100 (80GB) SXM GPU accelerator resulted in 80GB HBM2e memory capacity and 2.039 TB/s GPU memory bandwidth performance. Https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf MI200-18 - Calculations conducted by AMD Performance Labs as of Sep 21, 2021, for the AMD Instinct™ MI250X and MI250 accelerators (OAM) designed with CDNA™ 2 6nm FinFet process technology at 1,600 MHz peak memory clock resulted in 128GB HBMe memory capacity. Published specifications on the NVidia Ampere A100 (80GB) SXM and A100 accelerators (PCIe®) showed 80GB memory capacity. Results found at: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf MI200-41 - Calculations conducted by AMD Performance Labs as of Jan 14, 2022, for the AMD Instinct™ MI210 (64GB HBM2e PCIe® card) accelerator at 1,700 MHz peak boost engine clock resulted in 45.3 TFLOPS peak theoretical double precision (FP64 Matrix), 22.6 TFLOPS peak theoretical double precision (FP64), and 181.0 TFLOPS peak theoretical Bfloat16 format precision (BF16), floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), and 184.6 TFLOPS peak theoretical half precision (FP16), floating- point performance. Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS peak double precision (FP64) and 39 TFLOPS peak Bfloat16 format precision (BF16), theoretical floating-point performance. The TF32 data format is not IEEE compliant and not included in this comparison. https://www.nvidia.com/content/dam/en-zz/Solutions/Data- Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1. MI200-41 MI200-44 - Calculations conducted by AMD Performance Labs as of Feb 15, 2022, for the AMD Instinct™ MI210 (64GB HBM2e PCIe® card) 300 Watt accelerator at 1,700 MHz peak boost engine clock resulted in 45.3 TFLOPS peak theoretical double precision (FP64 Matrix), 22.6 TFLOPS peak theoretical double precision (FP64), 22.6 TFLOPS peak theoretical single precision (FP32), and 181.0 TFLOPS peak theoretical Bfloat16 format precision (BF16), floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), floating-point performance. Published results on the NVidia Ampere A100 (80GB) 300 Watt PCIe® GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core), 9.7 TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16), 39 TFLOPS peak Bfloat16 format precision (BF16), theoretical floating-point performance. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1. MI200-46 – Testing Conducted by AMD performance lab on a 2P socket 3rd Gen AMD EPYC™ ‘7763 CPU Supermicro 4124, with 8x AMD Instinct™ MI210 GPU (PCIe® 64GB 300W) No AMD Infinity Fabric™ technology enabled. OS: Ubuntu 18.04.6 LTS, ROCm 5.0Bench mark: Relion v3.1.2 Converted with HIP with AMD optimizations to Relion that are not yet available upstream. Vs. Nvidia Published Measurements: https://developer.nvidia.com/hpc-application-performance accessed 3/13/2022 Nvidia Container details found at: https://catalog.ngc.nvidia.com/orgs/hpc/containers/relion information on Relion found at: https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install All results measured on systems with 8 GPUs, with 4GPU XGMI bridges where applicable. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations MI200-46
  • 66. 67 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ENDNOTES (CONT.) MI200-47A: - Testing Conducted on 10.03.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled. SBIOS2.2, Ubuntu® 18.04.6 LTS, host ROCm™ 5.2.0. LAMMPS container amdih-2022.5.04_130 (ROCm 5.1) versus. Nvidia public claims for 4x and 8x A100 PCIe 80GB GPUs http://web.archive.org/web/20220718053400/ https://developer.nvidia.com/hpc-application-performance Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-49A - Testing Conducted on 11.14.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 8x AMD Instinct™ MI210 GPU (PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6 LTS, Host ROCm 5.2.0, rocHPL 6.0.0. Results calculated from medians of five runs Vs. Testing Conducted by AMD performance on 2P socket AMD EPYC™ 7763 Supermicro 4124 with 8x NVIDIA A100 GPU (PCIe 80GB 300W), Ubuntu 18.04.6 LTS, CUDA 11.6, HPL Nvidia container image 21.4-HPL. All results measured on systems configured with 8 GPUs;2 pairs of 4xMI210 GPUs connected by a 4-way Infinity Fabric™ link bridge. 4 Pairs of 2xA100 80GB PCIe GPUs connected by a 2-way NVLink bridges. Information on HPL: https://www.netlib.org/benchmark/hpl/. AMD HPL Container Details: HPL Container not yet available on Infinity Hub. Nvidia HPL Container Detail: https://ngc.nvidia.com/catalog/containers/nvidia:hpc- benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-69A - Testing Conducted by AMD performance lab 11.14.2022 using HPL comparing two systems. 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, host ROCm 5.2.0 rocHPL6.0.0. AMD HPL container is not yet available on Infinity Hub. vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x, 2x, and4x Nvidia Ampere A100 80GB SXM 400W GPUs, CUDA 11.6 and Driver Version 510.47.03. HPL Container (nvcr.io/nvidia/hpc-benchmarks:21.4-hpl) obtained from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-70A Testing Conducted by AMD performance lab 11.2.2022 using HPCG 3.0 comparing two systems: 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, SBIOS M12, Ubuntu 20.04.4, Host ROCm 5.2.0, AMD HPCG 3.0 container is not yet available on Infinity Hub. vs. 2P AMD EPYC™ 7742 server with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. HPCG 3.0 Container: nvcr.io/nvidia/hpc-benchmarks:21.4-hpcg at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-72 - SuperBench v0.6 model training comparison based on AMD internal testing as of 11/09/2022 measuring the total training throughput using a 2P AMD EPYC™ 7763 CPU server tested with 1x AMD Instinct™ MI250 (128GB HBM2e) 560W GPUs with Infinity Fabric™ technology; ROCm™ 5.2, PyTorch 1.12 versus a 2P EPYC™ 7742 CPU server with 1x Nvidia Ampere A100 (80GB HBM2e) 400W SXM GPU, CUDA® 11.6, PyTorch 1.8.0; Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations. MI200-73 - Testing Conducted by AMD performance lab 8.26.22 using AMBER: Cellulose_production_NPT_4fs, Cellulose_production_NVE_4fs, FactorIX_production_NPT_4fs, FactorIX_production_NVE_4fs, STMV_production_NPT_4fs, and STMV_production_NVE_4fs, JAC_production_NPT_4fs, JAC production_NVE-4fs comparing two systems: 2P EPYC™ 7763 powered server with 1x A MD Instinct™ MI250 (128GB HBM2e) 560W GPU, ROCm 5.2.0, Amber container 22.amd_100 vs. 2P EPYC™ 7742 powered server with 1x Nvidia A100 SXM (80GB HBM2e) 400W GPU, MPS enabled (2 instances), CUDA 11.6. Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations. MI200-74 - Testing Conducted by AMD performance lab 10.18.22 using Gromacs: STMV Comparing two systems: 2P EPYC™ 7763 powered server with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU with Infinity Fabric™ technology, ROCm™ 5.2.0, Gromacs container 2022.3.amd1_174 vs. Nvidia public claims https://developer.nvidia.com/hpc-application-performance. (Gromacs 2022.2). Dual EPYC 7742 with 4x Nvidia Ampere A100 SXM 80GB GPUs. Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations. MI200-75 - Testing Conducted by AMD performance lab 9.13.22 using OpenMM at double precision: gbsa, rf, pme, amoebagk, amoebapme, apoa1rf, apoa1pme, apoa1ljpme at double precision Comparing two systems: 2P EPYC™ 7763 powered server with 1x AMD Instinct™ MI250 with AMD Infinity Fabric Technology™ (128GB HBM2e) 560W GPU, ROCm 5.2.0, OpenMM container 7.7.0_49 vs. 2P EPYC™ 7742 powered server with 1x Nvidia A100 SXM (80GB HBM2e) 400W GPU, MPS enabled (2 instances), CUDA 11.6, OpenMM 7.7.0. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-76 - Testing Conducted by AMD performance lab 9.13.22 using NAMD: STMV_NVE, APOA1_NVE Comparing two systems: 2P EPYC™ 7763 powered server with 1x, 2x and 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU with AMD Infinity Fabric™ technology, ROCm 5.2.0, NAMD container namd3:3.0a9 vs. Nvidia public claims of performance of 2P EPYC 7742 Server with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM GPU https://developer.nvidia.com/hpc-application- performance. (v2.15a AVX-512). Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
  • 67. 68 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ENDNOTES (CONT.) MI200-77 - Testing Conducted by AMD performance lab 10.03.22 using LAMMPS: LJ, ReaxFF and Tersoff Comparing two systems: 2P EPYC™ 7763 powered server with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPU, ROCm 5.2.0, LAMMPS container amdih/lammps:2022.5.04_130 vs. Nvidia public claims http://web.archive.org/web/20220718053400/https://developer.nvidia.com/hpc-application-performance. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-78 - Testing Conducted by AMD performance lab on 9/10/2022 using Quicksilver (https://github.com/LLNL/Quicksilver.git) comparing two systems: 2P socket AMD EPYC™ 7763 CPU Supermicro 4124 with 8x AMD Instinct™ MI210 GPU (PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6 LTS, Host ROCm 5.2.0 vs. 2P socket AMD EPYC™ 7763 Supermicro 4124 with 8x NVIDIA A100 GPU (PCIe 80GB 300W), Ubuntu 18.04.6 LTS, CUDA 11.6, Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-79 - Testing Conducted by AMD performance lab on 8/12/2022 using AMG Solve comparing two systems: 2P EPYC™ 7763 powered server, SMT disabled, with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs with Infinity Fabric technology, SBIOS M12, Ubuntu 20.04.4, Host ROCm 5.2.0 vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x, 2x, and 4x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-80 HuggingFace Transformers training throughput comparison based on AMD internal testing as of 1/17/2023 using a 2P AMD EPYC™ 7763 CPU server tested with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPUs with Infinity Fabric™ technology; ROCm™ 5.4, PyTorch 1.12.1, base image: rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_1.12.1 versus a 2P EPYC™ 7742 CPU server with 4x Nvidia Ampere A100 (80GB HBM2e) 400W SXM GPU, CUDA® 11.8, PyTorch 1.13.0a0; base image: nvcr.io/nvidia/pytorch:22.11-py3. Server manufacturers may vary configurations, yielding different results. Performance may vary based on factors including use of latest drivers and optimizations. MI200-81 HPL-AI comparison based on AMD internal testing as of 11/2/2022 measuring the HPL-AI benchmark performance (TFLOPS) using a server with 2x EPYC™ 7763 with 4x MI250 (128MB HBM2e) with Infinity Fabric running host ROCm™ 5.2.0, HPL-AI-AMD v1.0.0; AMD HPL-AI container not yet available on Infinity Hub. versus a server with 2x EPYC 7742 with 4x A100 SXM (80GB HBM2e) running CUDA® 11.6, HPL-AI-NVIDIA v2.0.0, container nvcr.io/nvidia/hpc-benchmarks:21.4- hpl. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-82: "Testing Conducted by AMD performance lab on 11/25/2022 using PyFR TGV and NACA 0021 comparing two systems:2P EPYC™ 7763 powered server, SMT disabled, with 1x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs with AMD Infinity Guard technology, SBIOS M12, Ubuntu 20.04.4, Host ROCm 5.2.0 vs. 2P AMD EPYC™ 7742 server, SMT enabled, with 1x Nvidia Ampere A100 80GB SXM 400W GPUs, SBIOS 0.34, Ubuntu 20.04.4, CUDA 11.6. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations." MI200 -85 Testing Conducted on 10.18.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU Supermicro 4124 with 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled. SBIOS2.2, Ubuntu® 18.04.6 LTS, host ROCm™ 5.2.0. Gromacs container 2022.3.amd1_174 (ROCm 5.3) versus Nvidia public claims for 4x and 8x A100 PCIe 80GB https://developer.nvidia.com/hpc-application-performance. (Gromacs 2022.3). Server manufacturers may vary configurations, yielding different results. Performance may vary based factors including use of latest drivers and optimizations. MI200-86: 2P 3rd Gen AMD EPYC™ 7763 CPU server with 1x AMD Instinct™ MI210 (64 GB HBM2e, PCIe®) GPU performance running OpenMM, at double precision, gbsa is ~1.1x (~10%) faster amoebagk is ~1.0x (comparable) amoebapme is ~1.1x (~10%) faster 2P 3rd Gen AMD EPYC™ 7763 CPU server with 1x Nvidia Ampere A100 80GB PCIe®, 300W. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-87: Testing Conducted on 8.12.2022 by AMD performance lab on a 2P socket AMD EPYC™ ‘7763 CPU production platform with 1x, 2x, 4x, 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled. SBIOS2.2, Ubuntu® 18.04.6 LTS, ROCm™ 5.0.0. versus 2P socket AMD EPYC™ 7763 production platform with 1x, 2x, 4x, 8x NVIDIA A100 GPUs (PCIe 80GB 300W) with NVLink technology enabled. SBIOS 2.3a, Ubuntu 18.04.6 LTS, CUDA® 11.6 Update 2. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-88: Testing Conducted by AMD performance lab between 10.03.2022 and 11.14.2022 using HPL, Gromacs STMV and LAMMPS ReaxFF comparing two systems. 2P AMD EPYC™ ‘7763 CPU production platform with 2x, 4x and 8x AMD Instinct™ MI210 GPU (PCIe® 64GB,300W), AMD Infinity Fabric™ technology enabled. Ubuntu 18.04.6 LTS, versus 2P AMD EPYC™ 7763 CPU production platform with 1x, 2x, and 4x AMD Instinct™ MI250 (128 GB HBM2e) 560W GPUs, AMD Infinity Fabric™ technology enabled. Ubuntu 20.04.4 LTS. Both systems compare using Host ROCm 5.2.0, rocHPL6.0.0. (ROCm5.3), Gromacs container 2022.3.amd1_174 (ROCm5.3), LAMMPS container amdih-2022.5.04_130 (ROCm5.1). Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-90: Testing Conducted by AMD performance lab on 04/14/2023 using OpenFOAM v2206 on a 2P EPYC 7763 CPU production server with 1x, 2x, and 4x AMD Instinct™ MI250 GPUs (128GB, 560W) with AMD Infinity Fabric™ technology enabled, ROCm™5.3.3, Ubuntu® 20.04.4 vs a 2P EPYC 7742 CPU production server with 1x, 2x, and 4x Nvidia A100 80GB SXM GPUs (400W) with NVLink technology enabled, CUDA® 11.8, Ubuntu 20.04.4. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI200-91: Testing Conducted by AMD performance lab on 04/14/2023 using OpenFOAM v2206 on a 2P EPYC 7763 CPU production server with 1x, 2x, 4x and 8x AMD Instinct™ MI210 GPUs (PCIe® 64GB,300W) with AMD Infinity Fabric™ technology enabled, ROCm5.3.3, Ubuntu 20.04.4 vs a 2P EPYC 7763 CPU production server with 1x, 2x, 4x and 8x Nvidia A100 80GB PCIe GPUs (300W) with NVLink technology enabled, CUDA 11.8, Ubuntu 20.04.4. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
  • 68. 69 AMD INSTINCT ACCELERATORS UPDATE | Q2 2023 | PUBLIC ENDNOTES (CONT.) MI300-003 - Measurements by AMD Performance Labs June 4, 2022 on current specification and/or estimation for estimated delivered FP8 floating point performance with structure sparsity supported for AMD Instinct™ MI300 vs. MI250X FP16 (306.4 estimated delivered TFLOPS based on 80% of peak theoretical floating-point performance). MI300 performance based on preliminary estimates and expectations. Final performance may vary. MI300-004 – Measurements by AMD Performance Labs June 4, 2022. MI250X (560W) FP16 (306.4 estimated delivered TFLOPS based on 80% of peak theoretical floating-point performance). MI300 FP8 performance based on preliminary estimates and expectations. MI300 TDP power based on preliminary projections. Actual results based on production silicon may vary. MI300-005: Calculations conducted by AMD Performance Labs as of May 17, 2023, for the AMD Instinct™ MI300X OAM accelerator 750W (192 GB HBM3) designed with AMD CDNA™ 3 5nm FinFet process technology resulted in 192 GB HBM3 memory capacity and 5.218 TFLOPS sustained peak memory bandwidth performance. MI300X memory bus interface is 8,192 and memory data rate is 5.6 Gbps for total sustained peak memory bandwidth of 5.218 TB/s (8,192 bits memory bus interface * 5.6 Gbps memory data rate/8)*0.91 delivered adjustment. The highest published results on the NVidia Hopper H100 (80GB) SXM GPU accelerator resulted in 80GB HBM3 memory capacity and 3.35 TB/s GPU memory bandwidth performance. MI300-07K:Measurements by internal AMD Performance Labs as of June 2, 2023 on current specifications and/or internal engineering calculations. Large Language Model (LLM) run or calculated with FP16 precision to determine the minimum number of GPUs needed to run the Falcon (40B parameter) models. Tested result configurations: AMD Lab system consisting of 1x EPYC 9654 (96-core) CPU with 1x AMD Instinct™ MI300X (192GB HBM3, OAM Module) 750W accelerator tested at FP16 precision. Server manufacturers may vary configuration offerings yielding different results. MI300-08K - Measurements by internal AMD Performance Labs as of June 2, 2023 on current specifications and/or internal engineering calculations. Large Language Model (LLM) run comparisons with FP16 precision to determine the minimum number of GPUs needed to run the Falcon (40B parameters); GPT-3 (175 Billion parameters), PaLM 2 (340 Billion parameters); PaLM (540 Billion parameters) models.Calculated estimates based on GPU-only memory size versus memory required by the model at defined parameters plus 10% overhead. Calculations rely on published and sometimes preliminary model memory sizes. Tested result configurations: AMD Lab system consisting of 1x EPYC 9654 (96-core) CPU with 1x AMD Instinct™ MI300X (192GB HBM3, OAM Module) 750W accelerator Vs. Competitve testing done on Cirrascale Cloud Services comparable instance with permission. Results (FP16 precision):Model: Parameters Tot Mem. Reqd MI300X Reqd Competition Reqd Falcon-40B 40 Billion 88 GB 1 Actual 2 Actual GPT-3 175 Billion 385 GB 3 Calculated 5 Calculated PaLM 2 340 Billion 748 GB 4 Calculated 10 Calculated PaLM 540 Billion 1188 GB 7 Calculated 15 Calculated Calculated estimates may vary based on final model size; actual and estimates may vary due to actual overhead required and using system memory beyond that of the GPU. Server manufacturers may vary configuration offerings yielding different results