AMD's EPYC Architecture has paved the way forward towards Heterogeneous Data Centric Computing, but it is still limited by it's parallel DDR interfaces. This presentation shows the potential for the EPYC architecture if it adopted the Open Memory Interface, OMI, for it's Near Memory interface.
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
If AMD Adopted OMI in their EPYC Architecture
1. Allan Cantle - 5/25/2021
If AMD Adopted OMI in their
EPYC Architecture
2. Heterogeneous Computing’s Memory Challenge
• Today’s HPC, HPDA & ML applications need Heterogeneous Computing
• Heterogeneous Processors / Accelerators have varying memory needs
• CPU -> Low Latency & Cache Line Random Access Bandwidth
• GPU -> HBM Bandwidths
• AI/ML -> High Bandwidth and Capacity
• FPGA -> High Streaming Bandwidth
• Challenge, can 1 Near/Local Memory bus support all these requirements?
3. Memory Interface Comparison
OMI - Bandwidth of HBM at DDR Latency, Capacity & Cost
DRAM
Capacity,
TBytes
Log Scale
0.01
0.1
1.0
10
0.01 0.1 1 10
Memory Bandwidth, TBytes/s Log Scale
Legend
Color
= DDR4 / DDR5
= OMI
= HBM2E
OMI
HBM2E
DDR4
0.001
DDR5
4. Memory Interface Comparison
Speci
f
ication LRDIMM DDR4 DDR5 HBM2E(8
-
High) OMI
Protocol Parallel Parallel Parallel Serial
Signalling Single-Ended Single-Ended Single-Ended Di
ff
erential
I/O Type Duplex Duplex Simplex Simplex
LANES/Channel (Read/
Write)
64 32 512R/512W 8R/8W
LANE Speed 3,200MT/s 6,400MT/s 3,200MT/S 32,000MT/s
Channel Bandwidth (R+W) 25.6GBytes/s 25.6GBytes/s 400GBytes/s 64GBytes/s
Latency 41.5ns ? 60.4ns 45.5ns
Driver Area / Channel 7.8mm2 3.9mm2 11.4mm2 2.2mm2
Bandwidth/mm2 3.3GBytes/s/mm2 6.6GBytes/s/mm2 35GBytes/s/mm2 29.6GBytes/s/mm2
Max Capacity / Channel 64GB 256GB 16GB 256GB
Connection Multi Drop Multi Drop Point-to-Point Point-to-Point
Data Resilience Parity Parity Parity CRC
5. AMD EPYC Rome CPU
58.5mm x 75.4mm, 1mm pitch, LGA 4094 Socket SP3
15.06mm
27.63mm
AMD CPU Dies AMD GPU Dies
Xilinx FPGA Dies AI Dies
Scale 1mm : 10pts
12nm GF Process Node
6. AMD EPYC Rome IO Die Analysis
** AMD - EPYC Rome IO Die
8.34B Transistor on
TSMC 14nm/12nm? - 416mm2
Die shot 800pts x 436pts
Y * 1.8349Y = 416
Y ~ 15.06mm
X ~ 27.63mm
Scale 1mm : 20pts
DDR4 Memory Controller Area
4 Channels
2.2mm x 14.2mm
31.24mm2
1.1mm x 7.1mm / Channel
7.81mm2 / Channel
Peak Bandwidth / channel
= 3200MTPS * 8 Bytes
= 25.6GBytes/s
Peak Bandwidth per Channel Area
= 25.6 GBytes/s / 7.81mm2
3.28 GBytes/s/mm2
** https://wccftech.com/amd-2nd-gen-epyc-rome-iod-ccd-chipshots-39-billion-transistors/
Maximum Capacity per DDR4 DIMM
= 64GB
8. Mochup of AMD EPYC Genoa CPU?
LGA 6096 Socket - 75.4mm? x 75.4mm? x 0.92mm? pitch
WCCFTECH - Hardware Leak
TDP 120W to 320W
Con
f
igurable up to 400W Scale 1mm : 10pts
Targeting 7nm
13. AMD EPYC IO Die with OMI - Bene
f
its
• Balanced Memory Bandwidth to ∞ Fabric & PCIe Bandwidth
• Maintain EPYC Rome CPU LGA
-
4094 Socket size or smaller
• OMI DDIMM uses 1/4 pin count of a DDR DIMM Channel
• 24 OMI Channels
f
it into space of 6 DDR Channels
• Easier Motherboard routing - Fewer layers, lower cost
• Memory becomes composable with a Serdes interface
• Memory Technology Agnostic
• e.g. LPDDR5 OMI DDIMM for improved Power & Better Random Access
14. AMD EPYC Genoa with OMI Memory
OCP
-
HPC Module Block Schematic
320x Transceiver Lanes in Total
128x CXL/PCIe Lanes
192x OMI Lanes
EDSFF TA
-
1002
4C / 4C+
Connector
AMD
EPYC Genoa
with
OMI Memory
16 16
8 8
8 8
16 16
8 8
8 8
= 8 Lane OMI Channel
16
8
= 8 Lane CXL / PCIe-G5 Channel
= 16 Lane CXL / PCIe-G5 Channel
Nearstack PCIe
x8 Connector
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
NVMe SSD
NIC 3.0
Cabled CXL / PCIe x16 IO
Cabled CXL / PCIe x8 IO
15. Fully Composable Compute Node Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• Modular, Flexible and Composable AMD EPYC HPC Compute Node
• Opportunity to reduce OMI PHY Channel to 5
-
10dB, 1
-
2pJ/bit —> Easier to achieve 51.2G NRZ - DDR5
-
6400
• Opportunity to place AMD EPYC Chiplets directly onto OAM Substrate & remove LGA4094 package
• Better Power and Signal Integrity
AMD EPYC Genoa with OMI
OAM
-
HPC Module Top & Bottom View
OAM
-
HPC Module Bottom View
Populated with 12x E3.S OMI
Modules, 4x E3.s NVMe SSDs & 8x
Nearstack CXL/PCIex8 Cables
OAM
-
HPC Module
Common Bottom View for all
Processor/Accelerator
Implementations
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
CXL/PCIex16
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8
CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8
CXL/PCIex16
CXL/PCIex16 CXL/PCIex16
OAM
-
HPC Module Bottom View
Populated with 12x E3.S OMI
Modules and cabled CXL/PCIex8
18. Re-Architect - Start with a Cold Plate
For High Wattage OAM Modules
• Capillary Heatspreader on module to dissipate die heat across module surface area
• Heatsinks are largest Mass, so make them the structure of the assembly
• Integrate liquid cooling into the main cold plate
Current Air & Water Cooled OAMs
Water Cooled Cold Plate + built in 54V Power BusBars
+
X8
19. EPYC OMI CXL.mem Memory Pooling Server
Pluggable into OCP OAI Chassis
• 8x EPYC OMI Processors
• Up to 48 TBytes of OMI Memory
• 6TBytes Local to EPYC CPU
• 42 TBytes shared over CXL.mem
• 12.3 TBytes/s Aggregate Memory
Bandwidth
• 24x PCie-G5x16 E3.S NVMe SSDs
• 512 GBytes/s CXL.mem external
Memory Pooling Bandwidth
21. Alternative OMI DDIMMs
LPDDR5 Low Power and/or Improved Random Access
• LPPDR5 - Low Cost 3D stacked DRAM
• Wire Bond vs TSV
• High volume in Mobile devices
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
OMI
Bu
ff
er
Chip
OMI
-
32G
32 or 64 GByte Low Power DDIMM
64 or 128 GByte Low Power and
Improved Random Access DDIMM
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
OMI
Bu
ff
er
Chip
OMI
-
32G
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000