SlideShare a Scribd company logo
1 of 22
Download to read offline
Allan Cantle - 5/25/2021
If AMD Adopted OMI in their
EPYC Architecture
Heterogeneous Computing’s Memory Challenge
• Today’s HPC, HPDA & ML applications need Heterogeneous Computing


• Heterogeneous Processors / Accelerators have varying memory needs


• CPU -> Low Latency & Cache Line Random Access Bandwidth


• GPU -> HBM Bandwidths


• AI/ML -> High Bandwidth and Capacity


• FPGA -> High Streaming Bandwidth


• Challenge, can 1 Near/Local Memory bus support all these requirements?
Memory Interface Comparison
OMI - Bandwidth of HBM at DDR Latency, Capacity & Cost
DRAM
Capacity,
TBytes
Log Scale
0.01
0.1
1.0
10
0.01 0.1 1 10
Memory Bandwidth, TBytes/s Log Scale
Legend
Color
= DDR4 / DDR5
= OMI
= HBM2E
OMI
HBM2E
DDR4
0.001
DDR5
Memory Interface Comparison
Speci
f
ication LRDIMM DDR4 DDR5 HBM2E(8
-
High) OMI
Protocol Parallel Parallel Parallel Serial
Signalling Single-Ended Single-Ended Single-Ended Di
ff
erential
I/O Type Duplex Duplex Simplex Simplex
LANES/Channel (Read/
Write)
64 32 512R/512W 8R/8W
LANE Speed 3,200MT/s 6,400MT/s 3,200MT/S 32,000MT/s
Channel Bandwidth (R+W) 25.6GBytes/s 25.6GBytes/s 400GBytes/s 64GBytes/s
Latency 41.5ns ? 60.4ns 45.5ns
Driver Area / Channel 7.8mm2 3.9mm2 11.4mm2 2.2mm2
Bandwidth/mm2 3.3GBytes/s/mm2 6.6GBytes/s/mm2 35GBytes/s/mm2 29.6GBytes/s/mm2
Max Capacity / Channel 64GB 256GB 16GB 256GB
Connection Multi Drop Multi Drop Point-to-Point Point-to-Point
Data Resilience Parity Parity Parity CRC
AMD EPYC Rome CPU
58.5mm x 75.4mm, 1mm pitch, LGA 4094 Socket SP3
15.06mm
27.63mm
AMD CPU Dies AMD GPU Dies
Xilinx FPGA Dies AI Dies
Scale 1mm : 10pts
12nm GF Process Node
AMD EPYC Rome IO Die Analysis
** AMD - EPYC Rome IO Die 

8.34B Transistor on

TSMC 14nm/12nm? - 416mm2

Die shot 800pts x 436pts

Y * 1.8349Y = 416

Y ~ 15.06mm

X ~ 27.63mm
Scale 1mm : 20pts
DDR4 Memory Controller Area

4 Channels

2.2mm x 14.2mm

31.24mm2

1.1mm x 7.1mm / Channel

7.81mm2 / Channel
Peak Bandwidth / channel

= 3200MTPS * 8 Bytes 

= 25.6GBytes/s
Peak Bandwidth per Channel Area

= 25.6 GBytes/s / 7.81mm2

3.28 GBytes/s/mm2
** https://wccftech.com/amd-2nd-gen-epyc-rome-iod-ccd-chipshots-39-billion-transistors/
Maximum Capacity per DDR4 DIMM

= 64GB
AMD EPYC Rome IO Die
Speeds, Feeds & Capacity
• Aggregate Peak Read + Write Bandwidths


• PCIe-G4 = 4GB/s/lane - 512GB/s Total


• DDR4
-
3200 = 25.6GB/s/DIMM - 204GB/s Total


• ∞ Fabric = 6.25GBytes/s/lane, 800GB/s Total?


• Memory Bandwidth over subscribed*


• 1:4 Memory : ∞ Fabric


• 1:2.5 Memory : PCIe


• 1:7 Memory : ∞ Fabric & PCIe


• Memory Capacity @ 3200 = 64GB x 8 = 512GBytes
PCIe-G4
64 Lanes
256 GBytes/s
PCIe-G4
64 Lanes
256GBytes/s
∞ Fabric x2


200 GBytes/s?
∞ Fabric x2


200 GBytes/s?
∞ Fabric x2


200 GBytes/s?
∞ Fabric x2


200 GBytes/s?
DDR4
-
3200
x4


102GBytes/s
DDR4
-
3200
x4


102GBytes/s
Scale 1mm : 20pts
*for Data Bound Problems
Mochup of AMD EPYC Genoa CPU?
LGA 6096 Socket - 75.4mm? x 75.4mm? x 0.92mm? pitch
WCCFTECH - Hardware Leak
TDP 120W to 320W


Con
f
igurable up to 400W Scale 1mm : 10pts
Targeting 7nm
AMD EPYC Genoa IO Die
Assumed Speeds, Feeds & Capacity
• Aggregate Peak Read + Write Bandwidths


• PCIe-G5 = 8GB/s/lane - 1024 GB/s Total


• DDR5
-
5200 = 41.6GB/s/DIMM - 500GB/s Total


• ∞ Fabric = 8GB/s/lane, 1,536 GB/s Total?


• Memory Bandwidth over subscribed*


• 1:3 Memory : ∞ Fabric


• 1:2 Memory : PCIe


• 1:5 Memory : ∞ Fabric & PCIe


• Memory Capacity @ 5200 = 256GB x 12 = 3 TBytes
CXL / PCIe-G5
64 Lanes
512 GBytes/s
CXL / PCIe-G5
64 Lanes
512 GBytes/s
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
DDR5
-
5200
x6


250GBytes/s
DDR5
-
5200
x6


250GBytes/s
WCCFTECH - Hardware Leak
*for Data Bound Problems
IBM POWER10 Die
POWER10 

18B Transisters on 

Samsung 7nm - 602 mm2

~24.26mm x ~24.82mm
Die photo courtesy of Samsung Foundry


Scale 1mm : 20pts
OMI Memory Controller Area

2 Channels

1.441mm x 2.626mm

3.78mm2

Or

1.441mm x 1.313mm / Channel

1.89mm2 / Channel

Or

30.27mm2 for 16x Channels

Peak Bandwidth per Channel

= 32Gbits/s * 8 * 2(Tx + Rx)

= 64 GBytes/s
Peak Bandwidth per Area

= 64 GBytes/s / 1.89mm2

33.9 GBytes/s/mm2
Maximum DRAM Capacity 

per OMI DDIMM = 256GB
32Gb/s x8 OMI Channel
30dB @ <5pJ/bit
OMI
Bu
ff
er
Chip
2.5W per 64GBytes/s


Tx + Rx OMI Channel


At each end
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
16Gbit Monolithic Memory


Jedec con
f
igurations


32GByte 1U OMI DDIMM
64GByte 2U OMI DDIMM
256GByte 4U OMI DDIMM
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
DDR5


@ 4000
MTPS
Same TA
-
1002
EDSFF
Connector
Today’s 25.6Gbit/s DDR4 OMI DDIMM
Locked ratio to the DDR Speed


21.33Gb/s x8 - DDR4
-
2667


25.6Gb/s x8 - DDR4/5
-
3200


32Gb/s x8 - DDR5
-
4000


38.4Gb/s - DDR5
-
4800


42.66Gb/s - DDR5
-
5333


51.2Gb/s - DDR5
-
6400
<2ns (without wire)
<2ns (without wire)
Serdes Phy Latency


Mesochronous clocking
E3.S
Other Potential Emerging
EDSFF Media Formats
Up to
512GByte
Dual OMI
Channel
OMI Phy
AMD EPYC Genoa IO Die with OMI - Concept
Speeds, Feeds & Capacity
• Aggregate Peak Read + Write Bandwidths


• PCIe-G5 = 8GB/s/lane - 1024 GB/s Total


• OMI
-
32G = 64GB/s/DDIMM - 1,536 GB/s Total


• ∞ Fabric = 8GB/s/lane - 1,536 GB/s Total?


• Memory Bandwidth Balanced


• 1:1 Memory : ∞ Fabric


• 1:0.7 Memory : PCIe


• 1:1.7 Memory : ∞ Fabric & PCIe


• Memory Capacity @ 32G = 256GB x 24 = 6 TBytes
CXL / PCIe-G5
64 Lanes
512 GBytes/s
CXL / PCIe-G5
64 Lanes
512 GBytes/s
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
OMI
-
32G


x12


768GBytes/s
OMI
-
32G


X12


768GBytes/s
Scale 1mm : 20pts
AMD EPYC IO Die with OMI - Concept
Simultaneous Low Latency Near Memory & Far CXL.mem Sharing
• Aggregate Peak Read + Write Bandwidths


• PCIe-G5 = 8GBytes/s/lane


• OMI
-
32G = 64GBytes/s/DDIMM


• ∞ Fabric = 8GBytes/s/lane, 16 Lanes/Chan???


• Memory Bandwidth Balanced


• 1:1 Memory : ∞ Fabric


• 1:0.7 Memory : PCIe


• 1:1.7 Memory : ∞ Fabric & PCIe


• Memory Capacity @ 32G = 256GB x 24 = 6 TBytes
CXL / PCIe-G5
64 Lanes
512 GBytes/s
CXL / PCIe-G5
64 Lanes
512 GBytes/s
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
∞ Fabric x3


384 GBytes/s?
OMI
-
32G


x12


768GBytes/s
OMI
-
32G


X12


768GBytes/s
https://wccftech.com/amd-zen-4-powered-epyc-genoa-7004-cpus-more-than-64-cores-epyc-embedded-3004-up-to-64-cores/
Scale 1mm : 20pts
AMD EPYC IO Die with OMI - Bene
f
its
• Balanced Memory Bandwidth to ∞ Fabric & PCIe Bandwidth


• Maintain EPYC Rome CPU LGA
-
4094 Socket size or smaller


• OMI DDIMM uses 1/4 pin count of a DDR DIMM Channel


• 24 OMI Channels
f
it into space of 6 DDR Channels


• Easier Motherboard routing - Fewer layers, lower cost


• Memory becomes composable with a Serdes interface


• Memory Technology Agnostic


• e.g. LPDDR5 OMI DDIMM for improved Power & Better Random Access
AMD EPYC Genoa with OMI Memory
OCP
-
HPC Module Block Schematic
320x Transceiver Lanes in Total


128x CXL/PCIe Lanes


192x OMI Lanes
EDSFF TA
-
1002


4C / 4C+
Connector
AMD


EPYC Genoa


with


OMI Memory
16 16
8 8
8 8
16 16
8 8
8 8
= 8 Lane OMI Channel
16
8
= 8 Lane CXL / PCIe-G5 Channel
= 16 Lane CXL / PCIe-G5 Channel
Nearstack PCIe
x8 Connector
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
Up to
512GByte
Dual OMI
Channel
DDR5
Module
E3.S
NVMe SSD
NIC 3.0
Cabled CXL / PCIe x16 IO
Cabled CXL / PCIe x8 IO
Fully Composable Compute Node Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• Modular, Flexible and Composable AMD EPYC HPC Compute Node


• Opportunity to reduce OMI PHY Channel to 5
-
10dB, 1
-
2pJ/bit —> Easier to achieve 51.2G NRZ - DDR5
-
6400


• Opportunity to place AMD EPYC Chiplets directly onto OAM Substrate & remove LGA4094 package


• Better Power and Signal Integrity
AMD EPYC Genoa with OMI


OAM
-
HPC Module Top & Bottom View
OAM
-
HPC Module Bottom View


Populated with 12x E3.S OMI
Modules, 4x E3.s NVMe SSDs & 8x
Nearstack CXL/PCIex8 Cables
OAM
-
HPC Module


Common Bottom View for all
Processor/Accelerator
Implementations
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
CXL/PCIex16
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
Dual OMI Channel
CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8
CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8
CXL/PCIex16
CXL/PCIex16 CXL/PCIex16
OAM
-
HPC Module Bottom View


Populated with 12x E3.S OMI
Modules and cabled CXL/PCIex8
OCP Accelerator Infrastructure, OAI Chassis’
OCP
-
OAI Chassis with 8x OAM
-
HPC
Cable Con
f
igurable Topology - Fully Connected example
Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion
Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion
HIB HIB
HIB HIB
HIB HIB
HIB HIB
Re-Architect - Start with a Cold Plate
For High Wattage OAM Modules
• Capillary Heatspreader on module to dissipate die heat across module surface area


• Heatsinks are largest Mass, so make them the structure of the assembly


• Integrate liquid cooling into the main cold plate
Current Air & Water Cooled OAMs
Water Cooled Cold Plate + built in 54V Power BusBars
+
X8
EPYC OMI CXL.mem Memory Pooling Server
Pluggable into OCP OAI Chassis
• 8x EPYC OMI Processors


• Up to 48 TBytes of OMI Memory


• 6TBytes Local to EPYC CPU


• 42 TBytes shared over CXL.mem


• 12.3 TBytes/s Aggregate Memory
Bandwidth


• 24x PCie-G5x16 E3.S NVMe SSDs


• 512 GBytes/s CXL.mem external
Memory Pooling Bandwidth
Questions?
Alternative OMI DDIMMs
LPDDR5 Low Power and/or Improved Random Access
• LPPDR5 - Low Cost 3D stacked DRAM


• Wire Bond vs TSV


• High volume in Mobile devices
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
OMI
Bu
ff
er
Chip
OMI
-
32G
32 or 64 GByte Low Power DDIMM
64 or 128 GByte Low Power and


Improved Random Access DDIMM
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
OMI
Bu
ff
er
Chip
OMI
-
32G
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
LPDDR5
x16
@8000
AMD


EPYC Genoa


with


OMI Memory
16
16
8 8
8 8
16
16
8 8
8 8

More Related Content

What's hot

Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreAMD
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingAMD
 
All Presentations during CXL Forum at Flash Memory Summit 22
All Presentations during CXL Forum at Flash Memory Summit 22All Presentations during CXL Forum at Flash Memory Summit 22
All Presentations during CXL Forum at Flash Memory Summit 22Memory Fabric Forum
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingAMD
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityMemory Fabric Forum
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationSubhajit Sahu
 
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Michelle Holley
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxMemory Fabric Forum
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)Shivam Gupta
 
System-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design ChallengesSystem-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design Challengespboulet
 
Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsMemory Fabric Forum
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputerinside-BigData.com
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Slideshare - PCIe
Slideshare - PCIeSlideshare - PCIe
Slideshare - PCIeJin Wu
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionMemory Fabric Forum
 
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...inside-BigData.com
 

What's hot (20)

Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
All Presentations during CXL Forum at Flash Memory Summit 22
All Presentations during CXL Forum at Flash Memory Summit 22All Presentations during CXL Forum at Flash Memory Summit 22
All Presentations during CXL Forum at Flash Memory Summit 22
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
System-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design ChallengesSystem-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design Challenges
 
Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory Worlds
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Slideshare - PCIe
Slideshare - PCIeSlideshare - PCIe
Slideshare - PCIe
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory Solution
 
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
 

Similar to If AMD Adopted OMI in their EPYC Architecture

Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...Vaibhav R
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI KeynoteAllan Cantle
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemory Fabric Forum
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetupYutaka Kawai
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemorySubhajit Sahu
 
AMD Opteron™ 6200 Series Processor Guide, Silicon Mechanics
AMD Opteron™ 6200 Series Processor Guide, Silicon MechanicsAMD Opteron™ 6200 Series Processor Guide, Silicon Mechanics
AMD Opteron™ 6200 Series Processor Guide, Silicon Mechanicswaltermoss123
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specificationsinside-BigData.com
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028ssuser5b12d1
 
IBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationIBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationCliff Kinard
 
Product Roadmap iEi 2017
Product Roadmap iEi 2017Product Roadmap iEi 2017
Product Roadmap iEi 2017Andrei Teleanu
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architectureinside-BigData.com
 
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptx
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptxPowerEdge Rack and Tower Server Masters - AMD Server Memory.pptx
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptxNeoKenj
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.pptPencilData
 
Memory module
Memory moduleMemory module
Memory modulelemar12
 
Spansion HyperRam presentation
Spansion HyperRam presentationSpansion HyperRam presentation
Spansion HyperRam presentationSpansion
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01Yalçin KARACA
 

Similar to If AMD Adopted OMI in their EPYC Architecture (20)

Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM Memory
 
AMD Opteron™ 6200 Series Processor Guide, Silicon Mechanics
AMD Opteron™ 6200 Series Processor Guide, Silicon MechanicsAMD Opteron™ 6200 Series Processor Guide, Silicon Mechanics
AMD Opteron™ 6200 Series Processor Guide, Silicon Mechanics
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specifications
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
 
IBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationIBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical Presentation
 
Product Roadmap iEi 2017
Product Roadmap iEi 2017Product Roadmap iEi 2017
Product Roadmap iEi 2017
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
 
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptx
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptxPowerEdge Rack and Tower Server Masters - AMD Server Memory.pptx
PowerEdge Rack and Tower Server Masters - AMD Server Memory.pptx
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
Memory module
Memory moduleMemory module
Memory module
 
Phytium 64 core cpu preview
Phytium 64 core cpu previewPhytium 64 core cpu preview
Phytium 64 core cpu preview
 
Spansion HyperRam presentation
Spansion HyperRam presentationSpansion HyperRam presentation
Spansion HyperRam presentation
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
 
Amd Athlon Processors
Amd Athlon ProcessorsAmd Athlon Processors
Amd Athlon Processors
 

Recently uploaded

Presentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfPresentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfchapmanellie27
 
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...Amil baba
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRdollysharma2066
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)861c7ca49a02
 
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookvip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile servicerehmti665
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknowmakika9823
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls in Delhi
 
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一ss ss
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一diploma 1
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝soniya singh
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一C SSS
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...ur8mqw8e
 
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一ss ss
 
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...nagunakhan
 

Recently uploaded (20)

Presentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvfPresentation.pptxjnfoigneoifnvoeifnvklfnvf
Presentation.pptxjnfoigneoifnvoeifnvklfnvf
 
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Serviceyoung call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
 
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
 
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...
NO1 Qualified Best Black Magic Specialist Near Me Spiritual Healer Powerful L...
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookvip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile service
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
 
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一
定制(RHUL学位证)伦敦大学皇家霍洛威学院毕业证成绩单原版一比一
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
 
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
(办理学位证)韩国汉阳大学毕业证成绩单原版一比一
 
CIVIL ENGINEERING
CIVIL ENGINEERINGCIVIL ENGINEERING
CIVIL ENGINEERING
 
Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
 
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
定制(UI学位证)爱达荷大学毕业证成绩单原版一比一
 
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
Russian Call Girls In South Delhi Delhi 9711199012 💋✔💕😘 Independent Escorts D...
 

If AMD Adopted OMI in their EPYC Architecture

  • 1. Allan Cantle - 5/25/2021 If AMD Adopted OMI in their EPYC Architecture
  • 2. Heterogeneous Computing’s Memory Challenge • Today’s HPC, HPDA & ML applications need Heterogeneous Computing • Heterogeneous Processors / Accelerators have varying memory needs • CPU -> Low Latency & Cache Line Random Access Bandwidth • GPU -> HBM Bandwidths • AI/ML -> High Bandwidth and Capacity • FPGA -> High Streaming Bandwidth • Challenge, can 1 Near/Local Memory bus support all these requirements?
  • 3. Memory Interface Comparison OMI - Bandwidth of HBM at DDR Latency, Capacity & Cost DRAM Capacity, TBytes Log Scale 0.01 0.1 1.0 10 0.01 0.1 1 10 Memory Bandwidth, TBytes/s Log Scale Legend Color = DDR4 / DDR5 = OMI = HBM2E OMI HBM2E DDR4 0.001 DDR5
  • 4. Memory Interface Comparison Speci f ication LRDIMM DDR4 DDR5 HBM2E(8 - High) OMI Protocol Parallel Parallel Parallel Serial Signalling Single-Ended Single-Ended Single-Ended Di ff erential I/O Type Duplex Duplex Simplex Simplex LANES/Channel (Read/ Write) 64 32 512R/512W 8R/8W LANE Speed 3,200MT/s 6,400MT/s 3,200MT/S 32,000MT/s Channel Bandwidth (R+W) 25.6GBytes/s 25.6GBytes/s 400GBytes/s 64GBytes/s Latency 41.5ns ? 60.4ns 45.5ns Driver Area / Channel 7.8mm2 3.9mm2 11.4mm2 2.2mm2 Bandwidth/mm2 3.3GBytes/s/mm2 6.6GBytes/s/mm2 35GBytes/s/mm2 29.6GBytes/s/mm2 Max Capacity / Channel 64GB 256GB 16GB 256GB Connection Multi Drop Multi Drop Point-to-Point Point-to-Point Data Resilience Parity Parity Parity CRC
  • 5. AMD EPYC Rome CPU 58.5mm x 75.4mm, 1mm pitch, LGA 4094 Socket SP3 15.06mm 27.63mm AMD CPU Dies AMD GPU Dies Xilinx FPGA Dies AI Dies Scale 1mm : 10pts 12nm GF Process Node
  • 6. AMD EPYC Rome IO Die Analysis ** AMD - EPYC Rome IO Die 8.34B Transistor on TSMC 14nm/12nm? - 416mm2 Die shot 800pts x 436pts Y * 1.8349Y = 416 Y ~ 15.06mm X ~ 27.63mm Scale 1mm : 20pts DDR4 Memory Controller Area 4 Channels 2.2mm x 14.2mm 31.24mm2 1.1mm x 7.1mm / Channel 7.81mm2 / Channel Peak Bandwidth / channel = 3200MTPS * 8 Bytes = 25.6GBytes/s Peak Bandwidth per Channel Area = 25.6 GBytes/s / 7.81mm2 3.28 GBytes/s/mm2 ** https://wccftech.com/amd-2nd-gen-epyc-rome-iod-ccd-chipshots-39-billion-transistors/ Maximum Capacity per DDR4 DIMM = 64GB
  • 7. AMD EPYC Rome IO Die Speeds, Feeds & Capacity • Aggregate Peak Read + Write Bandwidths • PCIe-G4 = 4GB/s/lane - 512GB/s Total • DDR4 - 3200 = 25.6GB/s/DIMM - 204GB/s Total • ∞ Fabric = 6.25GBytes/s/lane, 800GB/s Total? • Memory Bandwidth over subscribed* • 1:4 Memory : ∞ Fabric • 1:2.5 Memory : PCIe • 1:7 Memory : ∞ Fabric & PCIe • Memory Capacity @ 3200 = 64GB x 8 = 512GBytes PCIe-G4 64 Lanes 256 GBytes/s PCIe-G4 64 Lanes 256GBytes/s ∞ Fabric x2 200 GBytes/s? ∞ Fabric x2 200 GBytes/s? ∞ Fabric x2 200 GBytes/s? ∞ Fabric x2 200 GBytes/s? DDR4 - 3200 x4 102GBytes/s DDR4 - 3200 x4 102GBytes/s Scale 1mm : 20pts *for Data Bound Problems
  • 8. Mochup of AMD EPYC Genoa CPU? LGA 6096 Socket - 75.4mm? x 75.4mm? x 0.92mm? pitch WCCFTECH - Hardware Leak TDP 120W to 320W Con f igurable up to 400W Scale 1mm : 10pts Targeting 7nm
  • 9. AMD EPYC Genoa IO Die Assumed Speeds, Feeds & Capacity • Aggregate Peak Read + Write Bandwidths • PCIe-G5 = 8GB/s/lane - 1024 GB/s Total • DDR5 - 5200 = 41.6GB/s/DIMM - 500GB/s Total • ∞ Fabric = 8GB/s/lane, 1,536 GB/s Total? • Memory Bandwidth over subscribed* • 1:3 Memory : ∞ Fabric • 1:2 Memory : PCIe • 1:5 Memory : ∞ Fabric & PCIe • Memory Capacity @ 5200 = 256GB x 12 = 3 TBytes CXL / PCIe-G5 64 Lanes 512 GBytes/s CXL / PCIe-G5 64 Lanes 512 GBytes/s ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? DDR5 - 5200 x6 250GBytes/s DDR5 - 5200 x6 250GBytes/s WCCFTECH - Hardware Leak *for Data Bound Problems
  • 10. IBM POWER10 Die POWER10 18B Transisters on Samsung 7nm - 602 mm2 ~24.26mm x ~24.82mm Die photo courtesy of Samsung Foundry Scale 1mm : 20pts OMI Memory Controller Area 2 Channels 1.441mm x 2.626mm 3.78mm2 Or 1.441mm x 1.313mm / Channel 1.89mm2 / Channel Or 30.27mm2 for 16x Channels Peak Bandwidth per Channel = 32Gbits/s * 8 * 2(Tx + Rx) = 64 GBytes/s Peak Bandwidth per Area = 64 GBytes/s / 1.89mm2 33.9 GBytes/s/mm2 Maximum DRAM Capacity per OMI DDIMM = 256GB 32Gb/s x8 OMI Channel 30dB @ <5pJ/bit OMI Bu ff er Chip 2.5W per 64GBytes/s Tx + Rx OMI Channel At each end DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS 16Gbit Monolithic Memory Jedec con f igurations 32GByte 1U OMI DDIMM 64GByte 2U OMI DDIMM 256GByte 4U OMI DDIMM DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS DDR5 @ 4000 MTPS Same TA - 1002 EDSFF Connector Today’s 25.6Gbit/s DDR4 OMI DDIMM Locked ratio to the DDR Speed 21.33Gb/s x8 - DDR4 - 2667 25.6Gb/s x8 - DDR4/5 - 3200 32Gb/s x8 - DDR5 - 4000 38.4Gb/s - DDR5 - 4800 42.66Gb/s - DDR5 - 5333 51.2Gb/s - DDR5 - 6400 <2ns (without wire) <2ns (without wire) Serdes Phy Latency Mesochronous clocking E3.S Other Potential Emerging EDSFF Media Formats Up to 512GByte Dual OMI Channel OMI Phy
  • 11. AMD EPYC Genoa IO Die with OMI - Concept Speeds, Feeds & Capacity • Aggregate Peak Read + Write Bandwidths • PCIe-G5 = 8GB/s/lane - 1024 GB/s Total • OMI - 32G = 64GB/s/DDIMM - 1,536 GB/s Total • ∞ Fabric = 8GB/s/lane - 1,536 GB/s Total? • Memory Bandwidth Balanced • 1:1 Memory : ∞ Fabric • 1:0.7 Memory : PCIe • 1:1.7 Memory : ∞ Fabric & PCIe • Memory Capacity @ 32G = 256GB x 24 = 6 TBytes CXL / PCIe-G5 64 Lanes 512 GBytes/s CXL / PCIe-G5 64 Lanes 512 GBytes/s ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? OMI - 32G x12 768GBytes/s OMI - 32G X12 768GBytes/s Scale 1mm : 20pts
  • 12. AMD EPYC IO Die with OMI - Concept Simultaneous Low Latency Near Memory & Far CXL.mem Sharing • Aggregate Peak Read + Write Bandwidths • PCIe-G5 = 8GBytes/s/lane • OMI - 32G = 64GBytes/s/DDIMM • ∞ Fabric = 8GBytes/s/lane, 16 Lanes/Chan??? • Memory Bandwidth Balanced • 1:1 Memory : ∞ Fabric • 1:0.7 Memory : PCIe • 1:1.7 Memory : ∞ Fabric & PCIe • Memory Capacity @ 32G = 256GB x 24 = 6 TBytes CXL / PCIe-G5 64 Lanes 512 GBytes/s CXL / PCIe-G5 64 Lanes 512 GBytes/s ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? ∞ Fabric x3 384 GBytes/s? OMI - 32G x12 768GBytes/s OMI - 32G X12 768GBytes/s https://wccftech.com/amd-zen-4-powered-epyc-genoa-7004-cpus-more-than-64-cores-epyc-embedded-3004-up-to-64-cores/ Scale 1mm : 20pts
  • 13. AMD EPYC IO Die with OMI - Bene f its • Balanced Memory Bandwidth to ∞ Fabric & PCIe Bandwidth • Maintain EPYC Rome CPU LGA - 4094 Socket size or smaller • OMI DDIMM uses 1/4 pin count of a DDR DIMM Channel • 24 OMI Channels f it into space of 6 DDR Channels • Easier Motherboard routing - Fewer layers, lower cost • Memory becomes composable with a Serdes interface • Memory Technology Agnostic • e.g. LPDDR5 OMI DDIMM for improved Power & Better Random Access
  • 14. AMD EPYC Genoa with OMI Memory OCP - HPC Module Block Schematic 320x Transceiver Lanes in Total 128x CXL/PCIe Lanes 192x OMI Lanes EDSFF TA - 1002 4C / 4C+ Connector AMD EPYC Genoa with OMI Memory 16 16 8 8 8 8 16 16 8 8 8 8 = 8 Lane OMI Channel 16 8 = 8 Lane CXL / PCIe-G5 Channel = 16 Lane CXL / PCIe-G5 Channel Nearstack PCIe x8 Connector E3.S Up to 512GByte Dual OMI Channel DDR5 Module E3.S Up to 512GByte Dual OMI Channel DDR5 Module E3.S NVMe SSD NIC 3.0 Cabled CXL / PCIe x16 IO Cabled CXL / PCIe x8 IO
  • 15. Fully Composable Compute Node Module Leveraged from OCP’s OAM Module - nicknamed OAM - HPC • Modular, Flexible and Composable AMD EPYC HPC Compute Node • Opportunity to reduce OMI PHY Channel to 5 - 10dB, 1 - 2pJ/bit —> Easier to achieve 51.2G NRZ - DDR5 - 6400 • Opportunity to place AMD EPYC Chiplets directly onto OAM Substrate & remove LGA4094 package • Better Power and Signal Integrity AMD EPYC Genoa with OMI OAM - HPC Module Top & Bottom View OAM - HPC Module Bottom View Populated with 12x E3.S OMI Modules, 4x E3.s NVMe SSDs & 8x Nearstack CXL/PCIex8 Cables OAM - HPC Module Common Bottom View for all Processor/Accelerator Implementations Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel CXL/PCIex16 Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel Dual OMI Channel CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex8 CXL/PCIex16 CXL/PCIex16 CXL/PCIex16 OAM - HPC Module Bottom View Populated with 12x E3.S OMI Modules and cabled CXL/PCIex8
  • 17. OCP - OAI Chassis with 8x OAM - HPC Cable Con f igurable Topology - Fully Connected example Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion Fabric Expansion HIB HIB HIB HIB HIB HIB HIB HIB
  • 18. Re-Architect - Start with a Cold Plate For High Wattage OAM Modules • Capillary Heatspreader on module to dissipate die heat across module surface area • Heatsinks are largest Mass, so make them the structure of the assembly • Integrate liquid cooling into the main cold plate Current Air & Water Cooled OAMs Water Cooled Cold Plate + built in 54V Power BusBars + X8
  • 19. EPYC OMI CXL.mem Memory Pooling Server Pluggable into OCP OAI Chassis • 8x EPYC OMI Processors • Up to 48 TBytes of OMI Memory • 6TBytes Local to EPYC CPU • 42 TBytes shared over CXL.mem • 12.3 TBytes/s Aggregate Memory Bandwidth • 24x PCie-G5x16 E3.S NVMe SSDs • 512 GBytes/s CXL.mem external Memory Pooling Bandwidth
  • 21. Alternative OMI DDIMMs LPDDR5 Low Power and/or Improved Random Access • LPPDR5 - Low Cost 3D stacked DRAM • Wire Bond vs TSV • High volume in Mobile devices LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000 OMI Bu ff er Chip OMI - 32G 32 or 64 GByte Low Power DDIMM 64 or 128 GByte Low Power and Improved Random Access DDIMM LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000 OMI Bu ff er Chip OMI - 32G LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000 LPDDR5 x16 @8000