SlideShare a Scribd company logo
AMD EPYCTM Microprocessor Architecture
JAY FLEISCHMAN, NOAH BECK, KEVIN LEPAK, MICHAEL CLARK
PRESENTED BY: JAY FLEISCHMAN | SENIOR FELLOW
2 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to,
the timing, features, functionality, availability, expectations, performance, and benefits of AMD's EPYCTM products , which are made
pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are
commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with
similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs,
assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause
actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and
uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other
future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and
statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings,
including but not limited to AMD's Quarterly Report on Form 10-K for the quarter ended December 30, 2017.
CAUTIONARY STATEMENT
3 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
IMPROVING TCO FOR ENTERPRISE AND MEGA DATACENTER
Key Tenets of EPYC™ Processor Design
CPU Performance ▪ “Zen”—high performance x86 core with Server features
Virtualization, RAS, and Security
▪ ISA, Scale-out microarchitecture, RAS
▪ Memory Encryption, no application mods: security you can use
Per-socket Capability
▪ Maximize customer value of platform investment
▪ MCM – allows silicon area > reticle area; enables feature set
▪ MCM – full product stack w/single design, smaller die, better yield
Fabric Interconnect
▪ Cache coherent, high bandwidth, low latency
▪ Balanced bandwidth within socket, socket-to-socket
Memory
▪ High memory bandwidth for throughput
▪ 16 DIMM slots/socket for capacity
I/O
▪ Disruptive 1 Processor (1P), 2P, Accelerator + Storage capability
▪ Integrated Server Controller Hub (SCH)
Power
▪ Performance or Power Determinism, Configurable TDP
▪ Precision Boost, throughput workload power management
4 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
Decode
4 instructions/cycle
512K L2
(I+D) Cache
8 Way
ADD MUL ADDMULALU
2 loads + 1 store
per cycle
6 ops dispatched
Op Cache
INTEGER FLOATING POINT
8 micro-ops/cycle
ALU ALU ALU
Micro-op Queue
64K I-Cache
4 way
Branch Prediction
AGUAGU
Load/Store
Queues
Integer Physical Register File
32K D-Cache
8 Way
FP Register File
Integer Rename Floating Point Rename
Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler
“ZEN” MICROARCHITECTURE
▪ Fetch Four x86 instructions
▪ Op Cache 2K instructions
▪ 4 Integer units
‒ Large rename space – 168 Registers
‒ 192 instructions in flight/8 wide retire
▪ 2 Load/Store units
‒ 72 Out-of-Order Loads supported
‒ Smart prefetch
▪ 2 Floating Point units x 128 FMACs
‒ built as 4 pipes, 2 Fadd, 2 Fmul
‒ 2 AES units
‒ SHA instruction support
▪ I-Cache 64K, 4-way
▪ D-Cache 32K, 8-way
▪ L2 Cache 512K, 8-way
▪ Large shared L3 cache
▪ 2 threads per core
5 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
INTEGER FLOATING
POINT
Integer Physical Register File FP Register File
Decode
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMUL4x ALUs
Micro-op Cache
Micro-op Queue
64K I-Cache 4 way Branch Prediction
Integer Rename Floating Point Rename
Schedulers Scheduler
L0/L1/L2
ITLB
Retire Queue
2x AGUs
Load Queue
Store Queue 32K D-Cache
8 Way
L1/L2
DTLBLoad Queue
Store Queue
512K
L2 (I+D) Cache
8 Way
MUL ADD
32K D-Cache
8 Way
L1/L2
DTLB
Integer Physical Register File
Integer Rename
Schedulers
FP Register File
Floating Point Rename
Scheduler
64K I-Cache 4 way
Decode
instructions
ADDMUL4x ALUs
Vertically Threaded
Op-Cache
micro-ops
Micro-op Queue
Branch Prediction
L0/L1/L2
ITLB
Competitively shared structures
Statically Partitioned
Competitively shared with Algorithmic Priority
Competitively shared and SMT Tagged
Retire Queue
6 ops dispatched
2x AGUs
SMT OVERVIEW
▪ All structures fully available in 1T mode
▪ Front End Queues are round robin
with priority overrides
▪ Increased throughput from SMT for data
center workloads
6 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
32B/cycle
32B fetch 32B/cycle
CORE 0
32B/cycle
32B/cycle
2*16B load
1*16B store
8M L3
I+D
Cache
16-way
512K L2
I+D
Cache
8-way
32K
D-Cache
8-way
64K
I-Cache
4-way
“ZEN” CACHE HIERARCHY
▪ Fast private L2 cache, 12 cycles
▪ Fast shared L3 cache, 35 cycles
▪ L3 filled with victims of the L2
▪ L2 tags duplicated in L3 for probe
filtering and fast cache transfer
▪ Multiple smart prefetchers
▪ 50 outstanding misses supported
from L2 to L3 per core
▪ 96 outstanding misses supported
from L3 to memory
HIGH BANDWIDTH, LARGE QUEUES FOR DATA CENTER
7 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
CPU COMPLEX
▪ A CPU complex (CCX) is four cores
connected to an L3 Cache
▪ The L3 Cache is 16-way associative,
8MB, mostly exclusive of L2
▪ The L3 Cache is made of 4 slices, by
low-order address interleave
▪ Every core can access every cache
with same average latency
▪ Utilize upper level metals for high
frequency data transfer
▪ Multiple CCXs instantiated at SoC
level
CORE 3
CORE 1L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 3L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 0 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 2 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
8 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
Resource Skylake-SP
“Bulldozer”
Family
“Zen”
Uops delivered 6 4 8
Issue Width 8 7 (4+3) 10(6+4)
INT Reg 180 112 168
INT Sched 97 48 84
FP Reg 168 176 160
ROB 224 128 192
FADD,FMUL,FMA 4/4/4 5/5/5 3/4/5
FP Width 512 128 128
HIGH LEVEL COMPETITIVE ANALYSIS
Resource Skylake-SP
“Bulldozer"
Family
“Zen”
LD Q 72 48 72
ST Q 56 32 44
Micro-op Cache 1.5K 0 2K
L1 Icache 32K 96K 64K
L1 Dcache 32K 32K 32K
L2 Cache 1M 2M 512K
L3 Cache/per core 1.33M 2M 2M
L2 TLB Latency 7 25 7
L2 Latency 14 19 12
L3 Latency 50+ 60 35
CORE COMPARISONCACHE COMPARISON
9 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
“ZEN”
Totally New
High-performance
Core Design
New High-Bandwidth,
Low Latency
Cache System
Designed from the ground up for
optimal balance of performance
and power for the datacenter
Simultaneous
Multithreading (SMT)
for High Throughput
Energy-efficient FinFET
Design for Enterprise-class
Products
ZEN DELIVERS
GREATER THAN
(vs Prior AMD processors) 52% IPC*
10 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
VIRTUALIZATION ISA AND MICROARCHITECTURE ENHANCEMENTS
ISA FEATURE NOTES “Bulldozer” “Zen”
Data Poisoning Better handling of memory errors ✓
AVIC Advanced Virtual Interrupt Controller ✓
Nested Virtualization Nested VMLOAD, VMSAVE, STGI, CLGI ✓
Secure Memory Encryption (SME) Encrypted memory support ✓
Secure Encrypted Virtualization (SEV) Encrypted guest support ✓
PTE Coalescing Combines 4K page tables into 32K page size ✓
“Zen” Core built for the Data Center
Microarchitecture Features
Reduction in World
Switch latency
Dual Translation
Table engines for
TLBs
Tightly coupled L2/L3
cache for scale-out
Private L2: 12 cycles
Shared L3: 35 cycles
Cache + Memory
topology reporting
for Hypervisor/OS
scheduling
11 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
HARDWARE MEMORY ENCRYPTION
SECURE MEMORY ENCRYPTION (SME)
▪ Protects against physical memory attacks
▪ Single key is used for encryption of
system memory
‒ Can be used on systems with Virtual Machines (VMs) or Containers
▪ OS/Hypervisor chooses pages to encrypt
via page tables or enabled transparently
▪ Support for hardware devices (network, storage,
graphics cards) to access encrypted pages
seamlessly through DMA
▪ No application modifications required
Defense Against Unauthorized
Access to Memory
DRAM
AES-128 Engine
OS/Hypervisor
VM Sandbox
/ VM
Key
Container
Applications
12 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
HARDWARE MEMORY ENCRYPTION
SECURE ENCRYPTED VIRTUALIZATION (SEV)
▪ Protects VMs/Containers from each other,
administrator tampering, and untrusted
Hypervisor
▪ One key for Hypervisor and one key per VM,
groups of VMs, or VM/Sandbox with multiple
containers
▪ Cryptographically isolates the hypervisor from
the guest VMs
▪ Integrates with existing AMD-V technology
▪ System can also run unsecure VMs
▪ No application modifications required
DRAM
AES-128 Engine
OS/Hypervisor
VM Sandbox
/ VM
Key Key …
…
Container
Key
ApplicationsApplicationsApplications
13 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
BUILT FOR SERVER: RAS (RELIABILITY, AVAILABILITY, SERVICEABILITY)
DRAM
Uncorrectable
Error
Core A
Poison
Cache
DRAM
Normal
Access
Good
Core B
(A) (B)
Uncorrectable Error limited to
consuming process (A) vs (B)
Data Poisoning
Enterprise RAS Feature Set
▪ Caches
‒ L1 data cache SEC-DED ECC
‒ L2+L3 caches with DEC-TED ECC
▪ DRAM
‒ DRAM ECC with x4 DRAM device failure correction
‒ DRAM Address/Command parity, write CRC—with replay
‒ Patrol and demand scrubbing
‒ DDR4 post package repair
▪ Data Poisoning and Machine Check Recovery
▪ Platform First Error Handling
▪ Infinity Fabric link packet CRC with retry
14 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
MCM VS. MONOLITHIC
▪ MCM approach has many advantages
‒ Higher yield, enables increased feature-set
‒ Multi-product leverage
▪ EPYC Processors
‒ 4x 213mm2 die/package = 852mm2 Si/package*
▪ Hypothetical EPYC Monolithic
‒ ~777mm2*
‒ Remove die-to-die Infinity Fabric On-Package
(IFOP) PHYs and logic (4/die), duplicated logic, etc.
‒ 852mm2 / 777mm2 = ~10% MCM area overhead
* Die sizes as used for Gross-Die-Per-Wafer
MCM Delivers Higher Yields, Increases Peak Compute, Maximizes Platform Value
▪ Inverse-exponential reduction in yield
with die size
‒ Multiple small dies has inherent yield advantage
I/ODDR
CCX
CCX
Die 1
I/O
I/ODDR
CCX
CCX
Die 3
I/O
I/ODDR
CCX
CCX
Die 2
I/O
I/ODDR
CCX
CCX
Die 0
I/O
CCX
CCX
CCX
CCX
CCX
CCX
CCX
CCX
I/O
DDR
I/O I/O I/O
DDR
DDRDDR
I/O I/O I/O I/O
32C Die Cost
1.0X
32C Die Cost
0.59X1
15 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
CHIP FLOORPLANNING FOR PACKAGE
▪ DDR placement on one die edge
▪ Chips in 4-die MCM rotated 180°, DDR facing package
left/right edges
▪ Package-top Infinity Fabric pinout requires diagonal
placement of IFIS
▪ 4th IFOP enables routing
of high-speed I/O in only
four package substrate
layers
IFIS/PCIe IFOP IFOP
Zen
Zen
Zen
Zen
L3
Zen
Zen
Zen
Zen
L3
IFOP IFIS/PCIe/SATAIFOP
DDR
DDR
I/O
CCX CCX
DDR I/O
16 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ADVANCED PACKAGING
▪ 58mm x 75mm organic package
▪ 4 die-to-die IFOP links/die
‒ 3 connected/die
▪ 2 SERDES/die to pins
‒ 1/die to “top” (G) and “bottom” (P) links
‒ Balanced I/O for 1P, 2P systems
▪ 2 DRAM-channels/die to pins
Purpose-built MCM Architecture
G[0-3], P[0-3] : 16 lane high-speed SERDES Links
M[A-H] : 8 x72b DRAM channels
: Die-to-Die Infinity Fabric On-Package links
CCX: 4Core + L2/core + shared L3 complex
P1
P0
P2
P3
G2
G3
G1
G0
MF
MG
MH
ME
Die-to-Package Topology Mapping
MA
MD
MC
MB
I/ODDR
CCX
CCX
Die 1
I/O
I/ODDR
CCX
CCX
Die 3
I/O
I/ODDR
CCX
CCX
Die 2
I/O
I/ODDR
CCX
CCX
Die 0
I/O
17 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
CHIP ARCHITECTURE
Infinity Fabric
Scalable Data Fabric
plane
IFIS/PCIe®
IFIS/PCIe/SATA
IFOP IF SCF
SMU
CAKE CAKE CAKE
PCIe Southbridge
IO Complex
SATA PCIe
CCM
CCX
4 cores + L3
CAKE
CAKE
IFOP
CCX
4 cores + L3
CCM
IFOP
DDRDDRIFOP
IOMS
CAKE UMC UMC
18 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
INFINITY FABRIC: MEMORY SYSTEM
Optimized Memory Bandwidth
and Capacity per Socket
▪ 8 DDR4 channels per socket
▪ Up to 2 DIMMs per channel, 16 DIMMs per socket
▪ Up to 2667MT/s, 21.3GB/s, 171GB/s per socket
‒ Achieves 145GB/s per socket, 85% efficiency*
▪ Up to 2TB capacity per socket (8Gb DRAMs)
▪ Supported memory
‒ RDIMM
‒ LRDIMM
‒ 3DS DIMM
‒ NVDIMM-N
19 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
I/O SUBSYSTEM
x16
x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x
Combo Link Type “B” Capabi
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
x1 x1 x1
Restrictions:
1) Ports from a single PHY must be of the same type
(one of xGMI, PCIe, SATA).
x1
Combo Link Type “A” Capabilities
x16x16
PHY
A0
PHY
A1
PHY
A2
PHY
A3
PHY
A4
PHY
B2
PHY
B3
PHY
B4
xGM
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
Combo Link Type “B” Capabilities
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
x1 x1 x1
Restrictions:
1) Ports from a single PHY must be of the same type
(one of xGMI, PCIe, SATA).
2) There are a maximum of 8 PCIe ports in any 16-lane combo link.
3) Ports from PHY A0 and PHY A1 must all be of the same type
(one of xGMI, PCIe, SATA).
x1
Combo Link Type “A” Capabilities
x16x16
PHY
A0
PHY
A1
PHY
A2
PHY
A3
PHY
A4
PHY
B0
PHY
B1
PHY
B2
PHY
B3
PHY
B4
PCIe Port
SATA Port
xGMI Link
Architected for Massive I/O Systems with
Leading-edge Feature Set, Platform Flexibility
▪ 8 x16 links available in 1 and 2 socket systems
‒ Link bifurcation support; max of 8 PCIe® devices per x16
‒ Socket-to-Socket Infinity Fabric (IFIS), PCIe, and SATA support
▪ 32GB/s bi-dir bandwidth per link, 256GB/s per socket
▪ PCIe Features*
‒ Advanced Error Reporting (AER)
‒ Enhanced Downstream Port Containment (eDPC)
‒ Non-Transparent Bridging (NTB)
‒ Separate RefClk with Independent Spread support (SRIS)
‒ NVMe and Hot Plug support
‒ Peer-to-peer support
▪ Integrated SATA support
* = See PCI-SIG for details
PCIe
IFIS Infinity Fabric
PCIe
IFIS Infinity Fabric
SATA
High Speed Link
(SERDES)
Capabilities
20 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
EPYC™—1P I/O PERFORMANCE
▪ High bandwidth,
high efficiency PCIe®
‒ Local DMA, Peer-to-Peer (P2P)
High Performance I/O and Storage
Achieved
1x16 PCIe Bandwidth*
(Efficiency vs 16GB/s per direction)
Local
DRAM
(GBs / Efficiency1)
Die-to-Die
Infinity Fabric (1-hop)
(GBs / Efficiency1)
DMA
Read 12.2 / 76% 12.1 / 76%
Write 13.0 / 81% 14.2 / 88%
Read + Write 22.8 / 71% 22.6 / 70%
P2P Write NA 13.6 / 85%
1: Peak efficiency of PCIe is ~85%-95% due to protocol overheads
EPYC™ 7601 1P
16 x Samsung pm1725a NVMe, Ubuntu 17.04,
FIO v2.16, 4KB 100% Read Results
9.2M IOPs*
▪ High performance storage
‒ ~285K IOPs/Core with 32 cores
‒ 1P direct-connect 24 drives x4
(no PCIe switches)
21 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
POWER MANAGING I/O
Dynamic Link Width Modulation
▪ The unprecedented connectivity that EPYC™ provides can consume additional power
▪ Full bandwidth is not required for all workloads
▪ Dynamic Link Width Modulation tracks the bandwidth demand of infinity fabric connections
between sockets and adapts bandwidth to the need
▪ Up to 8% performance/Watt gains at the CPU socket*
FABRIC
64 LANES
HIGH SPEED I/O
64 LANES
HIGH SPEED I/O
*See Endnotes
22 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
Power Performance
POWER: PERFORMANCE-DETERMINISM, POWER-DETERMINISM,
CONFIGURABLE TDP
▪ Server systems and environments vary
▪ Silicon varies
‒ Faster / higher leakage parts
‒ Slower / lower leakage parts
▪ Some customers demand repeatable, deterministic
performance for all environments
‒ In this mode, power consumption will vary
▪ Others may want maximum performance with
fixed power consumption
‒ In this mode, performance will vary
▪ EPYC™ CPUs allow boot-time choice of either mode
0.96
0.98
1
1.02
1.04
1.06
Performance
Max  System and Silicon Margin  Min
Power Determinism Mode
0
0.5
1
1.5
Power
Perf Determinism Mode
Normalized Performance and Power
Max  System and Silicon Margin  Min
Product TDP Low range High range
180W 165W 200W
155W 135W 170W
120W 105W 120W
▪ EPYC allows configurable TDP for TCO and peak
performance vs performance/Watt optimization
23 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
POWER: PER-CORE LINEAR VOLTAGE REGULATION
▪ Running all dies/cores at the voltage required by
the slowest wastes power
▪ Per-core voltage regulator capabilities enable
each core’s voltage to be optimized
‒ Significant core power savings and variation reduction
Per-core LDO Controlled
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7
VDD0
VDD1
RVDD (VRM output)
Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15
VDD8
VDD9
Die0 Die1
Core16 Core17 Core18 Core19 Core20 Core21 Core22 Core23
VDD16
VDD17
Die2
Core24 Core25 Core26 Core27 Core28 Core29 Core33 Core31
VDD24
VDD31
Die3
24 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
0
70
140
210
280
350
TRIAD TRIAD
STREAM Memory Bandwidth (EPYC 7601 1P -> 2P)
1P 2P
DELIVERED CPU PERFORMANCE
▪ Memory bandwidth and scaling
‒ 97% Bandwidth scaling 1P->2P
‒ Bandwidth for Disruptive 1P and 2P performance
▪ INT and FP performance
‒ Memory bandwidth for demanding FP
applications
‒ Competitive INT and FP performance
‒ Leadership TCO
Optimizing Compiler
(See endnotes for calculations)
AMD EPYC 7601
64 CORES, 2.2 GHz
HP DL385 Gen10
AOCC
INTEL XEON 8176
56 CORES, 2.1 GHz
HP DL380 Gen10
ICC
EPYC
Uplift
2P
SPECrate®2017_int_base 272 254 +7%
SPECrate®2017_fp_base 259 225 +15%
Peak Mem BW 8x2666 6x2666 +33%
MSRP $4200 $8719 Better Value
EPYC Processors
Deliver Compelling
Performance,
Memory Bandwidth,
and Value GB/s
25 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
AMD EPYC™ 7000
SERIES PROCESSOR
Architected for Enterprise and
Datacenter Server TCO
Up to 32 High-Performance “Zen” Cores
8 DDR4 Channels per CPU
Up to 2TB Memory per CPU
128 PCIe® Lanes
Dedicated Security Subsystem
Integrated Chipset for 1-Chip Solution
Socket-Compatible with Next Gen EPYC
Processors
26 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
GLOSSARY OF TERMS
“Bulldozer” = AMD’s previous family of high performance CPU cores
CAKE = Coherent AMD SocKet Extender
CCX = Tightly Coupled Core + Cache Complex
4Core + L2 cache/core + Shared L3 cache complex
CLGI = Clear Global Interrupt
DIMM = Dual Inline Memory Module
ECC = Error Checking and Correcting
SEC-DED = Single Error Correct, Double Error Detect
DEC-TED = Double Error Correct, Triple Error Detect
GPU = Graphics Processing Unit
IPC = Instructions Per Clock
IFIS = Infinity Fabric Inter-Socket
IFOP = Infinity Fabric On-Package
LDO = Low Drop Out, a linear voltage regulator
MCM = Multi-Chip Module
NUMA = Non-Uniform Memory Architecture
OS = Operating System
Post Package Repair = Utilize internal DRAM array redundancy to map out
errant locations
PSP = Platform Security Processor
RAS = Reliability, Availability, Serviceability
SATA = Serial ATA device connection
SCH = Server Controller Hub
SCM = Single Chip Module
Scrubbing = Searching for, and proactively correcting, ECC errors
SERDES = Serializer/Deserializer
STGI = Set Global Interrupt
TCO = Total Cost of Ownership
TDP = Thermal Design Power
Topology Reporting = Describing cache and memory layout via
ACPI/Software structures for OS/Hypervisor task scheduling optimization
VM = Virtual Machine
VRM = Voltage Regulator Module
World Switch = Switch between Guest Operating System and Hypervisor
27 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ENDNOTES
SLIDE 9
Updated Feb 28, 2017: Generational IPC uplift for the “Zen” architecture vs. “Piledriver” architecture is +52% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2 at a fixed 3.4GHz. Generational
IPC uplift for the “Zen” architecture vs. “Excavator” architecture is +64% as measured with Cinebench R15 1T, and also +64% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2, at a fixed
3.4GHz. System configs: AMD reference motherboard(s), AMD Radeon™ R9 290X GPU, 8GB DDR4-2667 (“Zen”)/8GB DDR3-2133 (“Excavator”)/8GB DDR3-1866 (“Piledriver”), Ubuntu Linux 16.x (SPECint_base2006
estimate) and Windows® 10 x64 RS1 (Cinebench R15). SPECint_base2006 estimates: “Zen” vs. “Piledriver” (31.5 vs. 20.7 | +52%), “Zen” vs. “Excavator” (31.5 vs. 19.2 | +64%). Cinebench R15 1t scores: “Zen” vs.
“Piledriver” (139 vs. 79 both at 3.4G | +76%), “Zen” vs. “Excavator” (160 vs. 97.5 both at 4.0G| +64%). GD-108
SLIDE 14
Based on AMD internal yield model using historical defect density data for mature technologies.
SLIDE 18
In AMD internal testing on STREAM Triad, 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored
290. NAP-22
SLIDE 20
In AMD internal testing on IO Bandwidth benchmarks. 2x EPYC 7601 in AMD “Ethanol” reference systems, BIOS “TDL0080F” (default settings), Ubuntu 15.04.2, 512GB (16 x 32GB 2Rx4 PC4-2666) memory. Read and
Write tests used AMD FirePro W7100 GPUs, 256B payload size, Driver “amdgpu-pro version 17.10-450821”. Read + Write tests used Mellanox IB 100Gbps Adapter, 512B payload size, Driver “OFED 3.3-1.04”. Read
and Write DMA tests performance using AMD SST (System Stress Tool). Write peer-to-peer (P2P) tests performance using AMD P2P (Peer to Peer) tool. Read + Write DMA tests performance using OFED qperf tool.
1 x EPYC 7601 CPU in HPE Cloudline CL3150, Ubuntu 17.04 4.10 kernel (Scheduler changed to NOOP, CPU governor set to performance), 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 24 x Samsung pm1725a NVMe
drives (with only 16 enabled): FIO v2.16 (4 Jobs per drive, IO Depth of 32, 4K block size) Average Read IOPs 9,178,000 on 100% Read Test (Average BW 35.85 GB/s); FIO (4 jobs per drive, IO depth of 10, 4K block
size) Average Write IOPs 7,111,000 on 100% Write Test (Average BW 27.78 GB/s Each run was done for 30 seconds with a 10 second ramp up using 16 NVMe drives. NAP-24
SLIDE 21
EPYC 7601 DVT part on AMD Ethanol 2P system, 32GB x16 (512GB) DR-2667 full pop 1of1, 1 500GB SSD, 1 1GBit Nic, Ubuntu 16.04 noGUI. Feature off gets an estimated score of 1297, feature on gets an estimated
score of 1321. Selectable power used to increase from 180W to 195W with feature-off yields the same feature-on estimated score of 1321 indicating an effective 8% perf/Watt improvement at the socket level for
feature-on (195W/180W)
28 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ENDNOTES
SLIDE 24
MSRP for AMD EPYC 7601 of $4200 and for Intel Platinum 8176 of $8719 based on Anandtech https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/10
AMD EPYC 7601 1-socket to 2-socket memory bandwidth scaling achieves up to 1.97x. In AMD internal testing on STREAM Triad, 1 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64
v4.5.2.1 compiler suite, 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 147; 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x
32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290
AMD EPYC 7601 CPU-based 2-socket system scored 272 on SPECrate®2017_int_base, 7% higher than an Intel Platinum 8176-based 2-socket system which scored 254, based on publications on www.spec.org as of
March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 272 on SPECrate®2017_int_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380
Gen10 score of 254 published October 2017 at www.spec.org. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00833.pdf
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00078.pdf
AMD EPYC 7601 CPU-based 2-socket system scored 259 on SPECrate®2017_fp_base, 15% higher than an Intel Platinum 8176-based 2-socket system which scored 225, based on publications on www.spec.org as of
March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 259 on SPECrate®2017_fp_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380
Gen10 score of 225 published October 2017 at www.spec.org. SPEC and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00845.pdf
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00077.pdf
29 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and
motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time
to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS
THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON
FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2018 Advanced Micro Devices, Inc. All rights reserved. AMD, EPYC, Radeon, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the
United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

More Related Content

What's hot

Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
AMD
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
George Markomanolis
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
AMD
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
Allan Cantle
 
eMMC 5.0 Total IP Solution
eMMC 5.0 Total IP SolutioneMMC 5.0 Total IP Solution
eMMC 5.0 Total IP Solution
Arasan Chip Systems
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_FinalGopi Krishnamurthy
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
AMD
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
inside-BigData.com
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
Grigory Sapunov
 
Universal Flash Storage
Universal Flash StorageUniversal Flash Storage
Universal Flash Storage
Bhaumik Bhatt
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
Allan Cantle
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
AMD
 
Dual port ram
Dual port ramDual port ram
Dual port ram
PravallikaTammisetty
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
CastLabKAIST
 
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
MIPI Alliance
 
eMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overvieweMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overview
VijayGESYS
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
RISC-V International
 
SMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLSMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXL
Memory Fabric Forum
 

What's hot (20)

Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
eMMC 5.0 Total IP Solution
eMMC 5.0 Total IP SolutioneMMC 5.0 Total IP Solution
eMMC 5.0 Total IP Solution
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Universal Flash Storage
Universal Flash StorageUniversal Flash Storage
Universal Flash Storage
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
Dual port ram
Dual port ramDual port ram
Dual port ram
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
 
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...
 
eMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overvieweMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overview
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
SMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLSMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXL
 

Similar to AMD EPYC™ Microprocessor Architecture

Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
Ganesan Narayanasamy
 
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIOHigh Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
Rebekah Rodriguez
 
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
The Linux Foundation
 
High-Density Top-Loading Storage for Cloud Scale Applications
High-Density Top-Loading Storage for Cloud Scale Applications High-Density Top-Loading Storage for Cloud Scale Applications
High-Density Top-Loading Storage for Cloud Scale Applications
Rebekah Rodriguez
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
Red_Hat_Storage
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
solarisyougood
 
PowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDAPowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDA
Alexander Grudanov
 
Sparc m6 32 in-memory infrastructure for the entire enterprise
Sparc m6 32 in-memory infrastructure for the entire enterpriseSparc m6 32 in-memory infrastructure for the entire enterprise
Sparc m6 32 in-memory infrastructure for the entire enterprise
solarisyougood
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
ODSA Workgroup
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
Ganesan Narayanasamy
 
The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems
Rebekah Rodriguez
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red_Hat_Storage
 
C++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers KitC++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers Kit
Intel® Software
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
Rebekah Rodriguez
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
lambertt
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
Big Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageBig Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object Storage
Intel® Software
 

Similar to AMD EPYC™ Microprocessor Architecture (20)

Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIOHigh Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
 
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
 
High-Density Top-Loading Storage for Cloud Scale Applications
High-Density Top-Loading Storage for Cloud Scale Applications High-Density Top-Loading Storage for Cloud Scale Applications
High-Density Top-Loading Storage for Cloud Scale Applications
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
 
PowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDAPowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDA
 
Sparc m6 32 in-memory infrastructure for the entire enterprise
Sparc m6 32 in-memory infrastructure for the entire enterpriseSparc m6 32 in-memory infrastructure for the entire enterprise
Sparc m6 32 in-memory infrastructure for the entire enterprise
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
C++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers KitC++ Programming and the Persistent Memory Developers Kit
C++ Programming and the Persistent Memory Developers Kit
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
 
AMD K6
AMD K6AMD K6
AMD K6
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Big Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageBig Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object Storage
 

More from AMD

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
AMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
AMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
AMD
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
AMD
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
AMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
AMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
AMD
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
AMD
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
AMD
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
AMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
AMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
AMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
AMD
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
AMD
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
AMD
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
AMD
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
AMD
 

More from AMD (18)

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

AMD EPYC™ Microprocessor Architecture

  • 1. AMD EPYCTM Microprocessor Architecture JAY FLEISCHMAN, NOAH BECK, KEVIN LEPAK, MICHAEL CLARK PRESENTED BY: JAY FLEISCHMAN | SENIOR FELLOW
  • 2. 2 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the timing, features, functionality, availability, expectations, performance, and benefits of AMD's EPYCTM products , which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-K for the quarter ended December 30, 2017. CAUTIONARY STATEMENT
  • 3. 3 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE IMPROVING TCO FOR ENTERPRISE AND MEGA DATACENTER Key Tenets of EPYC™ Processor Design CPU Performance ▪ “Zen”—high performance x86 core with Server features Virtualization, RAS, and Security ▪ ISA, Scale-out microarchitecture, RAS ▪ Memory Encryption, no application mods: security you can use Per-socket Capability ▪ Maximize customer value of platform investment ▪ MCM – allows silicon area > reticle area; enables feature set ▪ MCM – full product stack w/single design, smaller die, better yield Fabric Interconnect ▪ Cache coherent, high bandwidth, low latency ▪ Balanced bandwidth within socket, socket-to-socket Memory ▪ High memory bandwidth for throughput ▪ 16 DIMM slots/socket for capacity I/O ▪ Disruptive 1 Processor (1P), 2P, Accelerator + Storage capability ▪ Integrated Server Controller Hub (SCH) Power ▪ Performance or Power Determinism, Configurable TDP ▪ Precision Boost, throughput workload power management
  • 4. 4 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Decode 4 instructions/cycle 512K L2 (I+D) Cache 8 Way ADD MUL ADDMULALU 2 loads + 1 store per cycle 6 ops dispatched Op Cache INTEGER FLOATING POINT 8 micro-ops/cycle ALU ALU ALU Micro-op Queue 64K I-Cache 4 way Branch Prediction AGUAGU Load/Store Queues Integer Physical Register File 32K D-Cache 8 Way FP Register File Integer Rename Floating Point Rename Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler “ZEN” MICROARCHITECTURE ▪ Fetch Four x86 instructions ▪ Op Cache 2K instructions ▪ 4 Integer units ‒ Large rename space – 168 Registers ‒ 192 instructions in flight/8 wide retire ▪ 2 Load/Store units ‒ 72 Out-of-Order Loads supported ‒ Smart prefetch ▪ 2 Floating Point units x 128 FMACs ‒ built as 4 pipes, 2 Fadd, 2 Fmul ‒ 2 AES units ‒ SHA instruction support ▪ I-Cache 64K, 4-way ▪ D-Cache 32K, 8-way ▪ L2 Cache 512K, 8-way ▪ Large shared L3 cache ▪ 2 threads per core
  • 5. 5 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE INTEGER FLOATING POINT Integer Physical Register File FP Register File Decode 512K L2 (I+D) Cache 8 Way ADD MUL ADDMUL4x ALUs Micro-op Cache Micro-op Queue 64K I-Cache 4 way Branch Prediction Integer Rename Floating Point Rename Schedulers Scheduler L0/L1/L2 ITLB Retire Queue 2x AGUs Load Queue Store Queue 32K D-Cache 8 Way L1/L2 DTLBLoad Queue Store Queue 512K L2 (I+D) Cache 8 Way MUL ADD 32K D-Cache 8 Way L1/L2 DTLB Integer Physical Register File Integer Rename Schedulers FP Register File Floating Point Rename Scheduler 64K I-Cache 4 way Decode instructions ADDMUL4x ALUs Vertically Threaded Op-Cache micro-ops Micro-op Queue Branch Prediction L0/L1/L2 ITLB Competitively shared structures Statically Partitioned Competitively shared with Algorithmic Priority Competitively shared and SMT Tagged Retire Queue 6 ops dispatched 2x AGUs SMT OVERVIEW ▪ All structures fully available in 1T mode ▪ Front End Queues are round robin with priority overrides ▪ Increased throughput from SMT for data center workloads
  • 6. 6 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE 32B/cycle 32B fetch 32B/cycle CORE 0 32B/cycle 32B/cycle 2*16B load 1*16B store 8M L3 I+D Cache 16-way 512K L2 I+D Cache 8-way 32K D-Cache 8-way 64K I-Cache 4-way “ZEN” CACHE HIERARCHY ▪ Fast private L2 cache, 12 cycles ▪ Fast shared L3 cache, 35 cycles ▪ L3 filled with victims of the L2 ▪ L2 tags duplicated in L3 for probe filtering and fast cache transfer ▪ Multiple smart prefetchers ▪ 50 outstanding misses supported from L2 to L3 per core ▪ 96 outstanding misses supported from L3 to memory HIGH BANDWIDTH, LARGE QUEUES FOR DATA CENTER
  • 7. 7 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CPU COMPLEX ▪ A CPU complex (CCX) is four cores connected to an L3 Cache ▪ The L3 Cache is 16-way associative, 8MB, mostly exclusive of L2 ▪ The L3 Cache is made of 4 slices, by low-order address interleave ▪ Every core can access every cache with same average latency ▪ Utilize upper level metals for high frequency data transfer ▪ Multiple CCXs instantiated at SoC level CORE 3 CORE 1L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 3L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 0 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 2 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB
  • 8. 8 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Resource Skylake-SP “Bulldozer” Family “Zen” Uops delivered 6 4 8 Issue Width 8 7 (4+3) 10(6+4) INT Reg 180 112 168 INT Sched 97 48 84 FP Reg 168 176 160 ROB 224 128 192 FADD,FMUL,FMA 4/4/4 5/5/5 3/4/5 FP Width 512 128 128 HIGH LEVEL COMPETITIVE ANALYSIS Resource Skylake-SP “Bulldozer" Family “Zen” LD Q 72 48 72 ST Q 56 32 44 Micro-op Cache 1.5K 0 2K L1 Icache 32K 96K 64K L1 Dcache 32K 32K 32K L2 Cache 1M 2M 512K L3 Cache/per core 1.33M 2M 2M L2 TLB Latency 7 25 7 L2 Latency 14 19 12 L3 Latency 50+ 60 35 CORE COMPARISONCACHE COMPARISON
  • 9. 9 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE “ZEN” Totally New High-performance Core Design New High-Bandwidth, Low Latency Cache System Designed from the ground up for optimal balance of performance and power for the datacenter Simultaneous Multithreading (SMT) for High Throughput Energy-efficient FinFET Design for Enterprise-class Products ZEN DELIVERS GREATER THAN (vs Prior AMD processors) 52% IPC*
  • 10. 10 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE VIRTUALIZATION ISA AND MICROARCHITECTURE ENHANCEMENTS ISA FEATURE NOTES “Bulldozer” “Zen” Data Poisoning Better handling of memory errors ✓ AVIC Advanced Virtual Interrupt Controller ✓ Nested Virtualization Nested VMLOAD, VMSAVE, STGI, CLGI ✓ Secure Memory Encryption (SME) Encrypted memory support ✓ Secure Encrypted Virtualization (SEV) Encrypted guest support ✓ PTE Coalescing Combines 4K page tables into 32K page size ✓ “Zen” Core built for the Data Center Microarchitecture Features Reduction in World Switch latency Dual Translation Table engines for TLBs Tightly coupled L2/L3 cache for scale-out Private L2: 12 cycles Shared L3: 35 cycles Cache + Memory topology reporting for Hypervisor/OS scheduling
  • 11. 11 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE HARDWARE MEMORY ENCRYPTION SECURE MEMORY ENCRYPTION (SME) ▪ Protects against physical memory attacks ▪ Single key is used for encryption of system memory ‒ Can be used on systems with Virtual Machines (VMs) or Containers ▪ OS/Hypervisor chooses pages to encrypt via page tables or enabled transparently ▪ Support for hardware devices (network, storage, graphics cards) to access encrypted pages seamlessly through DMA ▪ No application modifications required Defense Against Unauthorized Access to Memory DRAM AES-128 Engine OS/Hypervisor VM Sandbox / VM Key Container Applications
  • 12. 12 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE HARDWARE MEMORY ENCRYPTION SECURE ENCRYPTED VIRTUALIZATION (SEV) ▪ Protects VMs/Containers from each other, administrator tampering, and untrusted Hypervisor ▪ One key for Hypervisor and one key per VM, groups of VMs, or VM/Sandbox with multiple containers ▪ Cryptographically isolates the hypervisor from the guest VMs ▪ Integrates with existing AMD-V technology ▪ System can also run unsecure VMs ▪ No application modifications required DRAM AES-128 Engine OS/Hypervisor VM Sandbox / VM Key Key … … Container Key ApplicationsApplicationsApplications
  • 13. 13 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE BUILT FOR SERVER: RAS (RELIABILITY, AVAILABILITY, SERVICEABILITY) DRAM Uncorrectable Error Core A Poison Cache DRAM Normal Access Good Core B (A) (B) Uncorrectable Error limited to consuming process (A) vs (B) Data Poisoning Enterprise RAS Feature Set ▪ Caches ‒ L1 data cache SEC-DED ECC ‒ L2+L3 caches with DEC-TED ECC ▪ DRAM ‒ DRAM ECC with x4 DRAM device failure correction ‒ DRAM Address/Command parity, write CRC—with replay ‒ Patrol and demand scrubbing ‒ DDR4 post package repair ▪ Data Poisoning and Machine Check Recovery ▪ Platform First Error Handling ▪ Infinity Fabric link packet CRC with retry
  • 14. 14 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE MCM VS. MONOLITHIC ▪ MCM approach has many advantages ‒ Higher yield, enables increased feature-set ‒ Multi-product leverage ▪ EPYC Processors ‒ 4x 213mm2 die/package = 852mm2 Si/package* ▪ Hypothetical EPYC Monolithic ‒ ~777mm2* ‒ Remove die-to-die Infinity Fabric On-Package (IFOP) PHYs and logic (4/die), duplicated logic, etc. ‒ 852mm2 / 777mm2 = ~10% MCM area overhead * Die sizes as used for Gross-Die-Per-Wafer MCM Delivers Higher Yields, Increases Peak Compute, Maximizes Platform Value ▪ Inverse-exponential reduction in yield with die size ‒ Multiple small dies has inherent yield advantage I/ODDR CCX CCX Die 1 I/O I/ODDR CCX CCX Die 3 I/O I/ODDR CCX CCX Die 2 I/O I/ODDR CCX CCX Die 0 I/O CCX CCX CCX CCX CCX CCX CCX CCX I/O DDR I/O I/O I/O DDR DDRDDR I/O I/O I/O I/O 32C Die Cost 1.0X 32C Die Cost 0.59X1
  • 15. 15 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CHIP FLOORPLANNING FOR PACKAGE ▪ DDR placement on one die edge ▪ Chips in 4-die MCM rotated 180°, DDR facing package left/right edges ▪ Package-top Infinity Fabric pinout requires diagonal placement of IFIS ▪ 4th IFOP enables routing of high-speed I/O in only four package substrate layers IFIS/PCIe IFOP IFOP Zen Zen Zen Zen L3 Zen Zen Zen Zen L3 IFOP IFIS/PCIe/SATAIFOP DDR DDR I/O CCX CCX DDR I/O
  • 16. 16 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ADVANCED PACKAGING ▪ 58mm x 75mm organic package ▪ 4 die-to-die IFOP links/die ‒ 3 connected/die ▪ 2 SERDES/die to pins ‒ 1/die to “top” (G) and “bottom” (P) links ‒ Balanced I/O for 1P, 2P systems ▪ 2 DRAM-channels/die to pins Purpose-built MCM Architecture G[0-3], P[0-3] : 16 lane high-speed SERDES Links M[A-H] : 8 x72b DRAM channels : Die-to-Die Infinity Fabric On-Package links CCX: 4Core + L2/core + shared L3 complex P1 P0 P2 P3 G2 G3 G1 G0 MF MG MH ME Die-to-Package Topology Mapping MA MD MC MB I/ODDR CCX CCX Die 1 I/O I/ODDR CCX CCX Die 3 I/O I/ODDR CCX CCX Die 2 I/O I/ODDR CCX CCX Die 0 I/O
  • 17. 17 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CHIP ARCHITECTURE Infinity Fabric Scalable Data Fabric plane IFIS/PCIe® IFIS/PCIe/SATA IFOP IF SCF SMU CAKE CAKE CAKE PCIe Southbridge IO Complex SATA PCIe CCM CCX 4 cores + L3 CAKE CAKE IFOP CCX 4 cores + L3 CCM IFOP DDRDDRIFOP IOMS CAKE UMC UMC
  • 18. 18 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE INFINITY FABRIC: MEMORY SYSTEM Optimized Memory Bandwidth and Capacity per Socket ▪ 8 DDR4 channels per socket ▪ Up to 2 DIMMs per channel, 16 DIMMs per socket ▪ Up to 2667MT/s, 21.3GB/s, 171GB/s per socket ‒ Achieves 145GB/s per socket, 85% efficiency* ▪ Up to 2TB capacity per socket (8Gb DRAMs) ▪ Supported memory ‒ RDIMM ‒ LRDIMM ‒ 3DS DIMM ‒ NVDIMM-N
  • 19. 19 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE I/O SUBSYSTEM x16 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x Combo Link Type “B” Capabi x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 x1 x1 x1 Restrictions: 1) Ports from a single PHY must be of the same type (one of xGMI, PCIe, SATA). x1 Combo Link Type “A” Capabilities x16x16 PHY A0 PHY A1 PHY A2 PHY A3 PHY A4 PHY B2 PHY B3 PHY B4 xGM x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 Combo Link Type “B” Capabilities x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 x1 x1 x1 Restrictions: 1) Ports from a single PHY must be of the same type (one of xGMI, PCIe, SATA). 2) There are a maximum of 8 PCIe ports in any 16-lane combo link. 3) Ports from PHY A0 and PHY A1 must all be of the same type (one of xGMI, PCIe, SATA). x1 Combo Link Type “A” Capabilities x16x16 PHY A0 PHY A1 PHY A2 PHY A3 PHY A4 PHY B0 PHY B1 PHY B2 PHY B3 PHY B4 PCIe Port SATA Port xGMI Link Architected for Massive I/O Systems with Leading-edge Feature Set, Platform Flexibility ▪ 8 x16 links available in 1 and 2 socket systems ‒ Link bifurcation support; max of 8 PCIe® devices per x16 ‒ Socket-to-Socket Infinity Fabric (IFIS), PCIe, and SATA support ▪ 32GB/s bi-dir bandwidth per link, 256GB/s per socket ▪ PCIe Features* ‒ Advanced Error Reporting (AER) ‒ Enhanced Downstream Port Containment (eDPC) ‒ Non-Transparent Bridging (NTB) ‒ Separate RefClk with Independent Spread support (SRIS) ‒ NVMe and Hot Plug support ‒ Peer-to-peer support ▪ Integrated SATA support * = See PCI-SIG for details PCIe IFIS Infinity Fabric PCIe IFIS Infinity Fabric SATA High Speed Link (SERDES) Capabilities
  • 20. 20 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE EPYC™—1P I/O PERFORMANCE ▪ High bandwidth, high efficiency PCIe® ‒ Local DMA, Peer-to-Peer (P2P) High Performance I/O and Storage Achieved 1x16 PCIe Bandwidth* (Efficiency vs 16GB/s per direction) Local DRAM (GBs / Efficiency1) Die-to-Die Infinity Fabric (1-hop) (GBs / Efficiency1) DMA Read 12.2 / 76% 12.1 / 76% Write 13.0 / 81% 14.2 / 88% Read + Write 22.8 / 71% 22.6 / 70% P2P Write NA 13.6 / 85% 1: Peak efficiency of PCIe is ~85%-95% due to protocol overheads EPYC™ 7601 1P 16 x Samsung pm1725a NVMe, Ubuntu 17.04, FIO v2.16, 4KB 100% Read Results 9.2M IOPs* ▪ High performance storage ‒ ~285K IOPs/Core with 32 cores ‒ 1P direct-connect 24 drives x4 (no PCIe switches)
  • 21. 21 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE POWER MANAGING I/O Dynamic Link Width Modulation ▪ The unprecedented connectivity that EPYC™ provides can consume additional power ▪ Full bandwidth is not required for all workloads ▪ Dynamic Link Width Modulation tracks the bandwidth demand of infinity fabric connections between sockets and adapts bandwidth to the need ▪ Up to 8% performance/Watt gains at the CPU socket* FABRIC 64 LANES HIGH SPEED I/O 64 LANES HIGH SPEED I/O *See Endnotes
  • 22. 22 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Power Performance POWER: PERFORMANCE-DETERMINISM, POWER-DETERMINISM, CONFIGURABLE TDP ▪ Server systems and environments vary ▪ Silicon varies ‒ Faster / higher leakage parts ‒ Slower / lower leakage parts ▪ Some customers demand repeatable, deterministic performance for all environments ‒ In this mode, power consumption will vary ▪ Others may want maximum performance with fixed power consumption ‒ In this mode, performance will vary ▪ EPYC™ CPUs allow boot-time choice of either mode 0.96 0.98 1 1.02 1.04 1.06 Performance Max  System and Silicon Margin  Min Power Determinism Mode 0 0.5 1 1.5 Power Perf Determinism Mode Normalized Performance and Power Max  System and Silicon Margin  Min Product TDP Low range High range 180W 165W 200W 155W 135W 170W 120W 105W 120W ▪ EPYC allows configurable TDP for TCO and peak performance vs performance/Watt optimization
  • 23. 23 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE POWER: PER-CORE LINEAR VOLTAGE REGULATION ▪ Running all dies/cores at the voltage required by the slowest wastes power ▪ Per-core voltage regulator capabilities enable each core’s voltage to be optimized ‒ Significant core power savings and variation reduction Per-core LDO Controlled Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 VDD0 VDD1 RVDD (VRM output) Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15 VDD8 VDD9 Die0 Die1 Core16 Core17 Core18 Core19 Core20 Core21 Core22 Core23 VDD16 VDD17 Die2 Core24 Core25 Core26 Core27 Core28 Core29 Core33 Core31 VDD24 VDD31 Die3
  • 24. 24 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE 0 70 140 210 280 350 TRIAD TRIAD STREAM Memory Bandwidth (EPYC 7601 1P -> 2P) 1P 2P DELIVERED CPU PERFORMANCE ▪ Memory bandwidth and scaling ‒ 97% Bandwidth scaling 1P->2P ‒ Bandwidth for Disruptive 1P and 2P performance ▪ INT and FP performance ‒ Memory bandwidth for demanding FP applications ‒ Competitive INT and FP performance ‒ Leadership TCO Optimizing Compiler (See endnotes for calculations) AMD EPYC 7601 64 CORES, 2.2 GHz HP DL385 Gen10 AOCC INTEL XEON 8176 56 CORES, 2.1 GHz HP DL380 Gen10 ICC EPYC Uplift 2P SPECrate®2017_int_base 272 254 +7% SPECrate®2017_fp_base 259 225 +15% Peak Mem BW 8x2666 6x2666 +33% MSRP $4200 $8719 Better Value EPYC Processors Deliver Compelling Performance, Memory Bandwidth, and Value GB/s
  • 25. 25 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE AMD EPYC™ 7000 SERIES PROCESSOR Architected for Enterprise and Datacenter Server TCO Up to 32 High-Performance “Zen” Cores 8 DDR4 Channels per CPU Up to 2TB Memory per CPU 128 PCIe® Lanes Dedicated Security Subsystem Integrated Chipset for 1-Chip Solution Socket-Compatible with Next Gen EPYC Processors
  • 26. 26 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE GLOSSARY OF TERMS “Bulldozer” = AMD’s previous family of high performance CPU cores CAKE = Coherent AMD SocKet Extender CCX = Tightly Coupled Core + Cache Complex 4Core + L2 cache/core + Shared L3 cache complex CLGI = Clear Global Interrupt DIMM = Dual Inline Memory Module ECC = Error Checking and Correcting SEC-DED = Single Error Correct, Double Error Detect DEC-TED = Double Error Correct, Triple Error Detect GPU = Graphics Processing Unit IPC = Instructions Per Clock IFIS = Infinity Fabric Inter-Socket IFOP = Infinity Fabric On-Package LDO = Low Drop Out, a linear voltage regulator MCM = Multi-Chip Module NUMA = Non-Uniform Memory Architecture OS = Operating System Post Package Repair = Utilize internal DRAM array redundancy to map out errant locations PSP = Platform Security Processor RAS = Reliability, Availability, Serviceability SATA = Serial ATA device connection SCH = Server Controller Hub SCM = Single Chip Module Scrubbing = Searching for, and proactively correcting, ECC errors SERDES = Serializer/Deserializer STGI = Set Global Interrupt TCO = Total Cost of Ownership TDP = Thermal Design Power Topology Reporting = Describing cache and memory layout via ACPI/Software structures for OS/Hypervisor task scheduling optimization VM = Virtual Machine VRM = Voltage Regulator Module World Switch = Switch between Guest Operating System and Hypervisor
  • 27. 27 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ENDNOTES SLIDE 9 Updated Feb 28, 2017: Generational IPC uplift for the “Zen” architecture vs. “Piledriver” architecture is +52% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2 at a fixed 3.4GHz. Generational IPC uplift for the “Zen” architecture vs. “Excavator” architecture is +64% as measured with Cinebench R15 1T, and also +64% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2, at a fixed 3.4GHz. System configs: AMD reference motherboard(s), AMD Radeon™ R9 290X GPU, 8GB DDR4-2667 (“Zen”)/8GB DDR3-2133 (“Excavator”)/8GB DDR3-1866 (“Piledriver”), Ubuntu Linux 16.x (SPECint_base2006 estimate) and Windows® 10 x64 RS1 (Cinebench R15). SPECint_base2006 estimates: “Zen” vs. “Piledriver” (31.5 vs. 20.7 | +52%), “Zen” vs. “Excavator” (31.5 vs. 19.2 | +64%). Cinebench R15 1t scores: “Zen” vs. “Piledriver” (139 vs. 79 both at 3.4G | +76%), “Zen” vs. “Excavator” (160 vs. 97.5 both at 4.0G| +64%). GD-108 SLIDE 14 Based on AMD internal yield model using historical defect density data for mature technologies. SLIDE 18 In AMD internal testing on STREAM Triad, 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290. NAP-22 SLIDE 20 In AMD internal testing on IO Bandwidth benchmarks. 2x EPYC 7601 in AMD “Ethanol” reference systems, BIOS “TDL0080F” (default settings), Ubuntu 15.04.2, 512GB (16 x 32GB 2Rx4 PC4-2666) memory. Read and Write tests used AMD FirePro W7100 GPUs, 256B payload size, Driver “amdgpu-pro version 17.10-450821”. Read + Write tests used Mellanox IB 100Gbps Adapter, 512B payload size, Driver “OFED 3.3-1.04”. Read and Write DMA tests performance using AMD SST (System Stress Tool). Write peer-to-peer (P2P) tests performance using AMD P2P (Peer to Peer) tool. Read + Write DMA tests performance using OFED qperf tool. 1 x EPYC 7601 CPU in HPE Cloudline CL3150, Ubuntu 17.04 4.10 kernel (Scheduler changed to NOOP, CPU governor set to performance), 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 24 x Samsung pm1725a NVMe drives (with only 16 enabled): FIO v2.16 (4 Jobs per drive, IO Depth of 32, 4K block size) Average Read IOPs 9,178,000 on 100% Read Test (Average BW 35.85 GB/s); FIO (4 jobs per drive, IO depth of 10, 4K block size) Average Write IOPs 7,111,000 on 100% Write Test (Average BW 27.78 GB/s Each run was done for 30 seconds with a 10 second ramp up using 16 NVMe drives. NAP-24 SLIDE 21 EPYC 7601 DVT part on AMD Ethanol 2P system, 32GB x16 (512GB) DR-2667 full pop 1of1, 1 500GB SSD, 1 1GBit Nic, Ubuntu 16.04 noGUI. Feature off gets an estimated score of 1297, feature on gets an estimated score of 1321. Selectable power used to increase from 180W to 195W with feature-off yields the same feature-on estimated score of 1321 indicating an effective 8% perf/Watt improvement at the socket level for feature-on (195W/180W)
  • 28. 28 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ENDNOTES SLIDE 24 MSRP for AMD EPYC 7601 of $4200 and for Intel Platinum 8176 of $8719 based on Anandtech https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/10 AMD EPYC 7601 1-socket to 2-socket memory bandwidth scaling achieves up to 1.97x. In AMD internal testing on STREAM Triad, 1 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 147; 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290 AMD EPYC 7601 CPU-based 2-socket system scored 272 on SPECrate®2017_int_base, 7% higher than an Intel Platinum 8176-based 2-socket system which scored 254, based on publications on www.spec.org as of March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 272 on SPECrate®2017_int_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380 Gen10 score of 254 published October 2017 at www.spec.org. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information. https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00833.pdf https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00078.pdf AMD EPYC 7601 CPU-based 2-socket system scored 259 on SPECrate®2017_fp_base, 15% higher than an Intel Platinum 8176-based 2-socket system which scored 225, based on publications on www.spec.org as of March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 259 on SPECrate®2017_fp_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380 Gen10 score of 225 published October 2017 at www.spec.org. SPEC and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information. https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00845.pdf https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00077.pdf
  • 29. 29 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2018 Advanced Micro Devices, Inc. All rights reserved. AMD, EPYC, Radeon, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.