Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AMD EPYC™ Microprocessor Architecture

6,055 views

Published on

From Cool Chips 2018, Jay Fleischman, Senior Fellow at AMD, presents AMD EPYC™ Microprocessor Architecture

Published in: Technology
  • Profit Maximiser redefined the notion of exploiting bookie offers as a longer-term, rather than a one-off opportunity. Seasoned users report steady month-by-month profits and support each other through a famously busy, private facebook group. The winner of our best matched betting product oscar has matured into something very, very special. ★★★ http://t.cn/A6hPRSfx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❤❤❤ http://bit.ly/2F4cEJi ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2F4cEJi ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Paid To Write? Earn up to $200/day on with simple writing jobs. ♥♥♥ https://tinyurl.com/vvgf8vz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • A 7 Time Lotto Winner Stepped Up to Share His Secrets With YOU ➤➤ http://t.cn/Airf5UFH
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

AMD EPYC™ Microprocessor Architecture

  1. 1. AMD EPYCTM Microprocessor Architecture JAY FLEISCHMAN, NOAH BECK, KEVIN LEPAK, MICHAEL CLARK PRESENTED BY: JAY FLEISCHMAN | SENIOR FELLOW
  2. 2. 2 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the timing, features, functionality, availability, expectations, performance, and benefits of AMD's EPYCTM products , which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-K for the quarter ended December 30, 2017. CAUTIONARY STATEMENT
  3. 3. 3 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE IMPROVING TCO FOR ENTERPRISE AND MEGA DATACENTER Key Tenets of EPYC™ Processor Design CPU Performance ▪ “Zen”—high performance x86 core with Server features Virtualization, RAS, and Security ▪ ISA, Scale-out microarchitecture, RAS ▪ Memory Encryption, no application mods: security you can use Per-socket Capability ▪ Maximize customer value of platform investment ▪ MCM – allows silicon area > reticle area; enables feature set ▪ MCM – full product stack w/single design, smaller die, better yield Fabric Interconnect ▪ Cache coherent, high bandwidth, low latency ▪ Balanced bandwidth within socket, socket-to-socket Memory ▪ High memory bandwidth for throughput ▪ 16 DIMM slots/socket for capacity I/O ▪ Disruptive 1 Processor (1P), 2P, Accelerator + Storage capability ▪ Integrated Server Controller Hub (SCH) Power ▪ Performance or Power Determinism, Configurable TDP ▪ Precision Boost, throughput workload power management
  4. 4. 4 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Decode 4 instructions/cycle 512K L2 (I+D) Cache 8 Way ADD MUL ADDMULALU 2 loads + 1 store per cycle 6 ops dispatched Op Cache INTEGER FLOATING POINT 8 micro-ops/cycle ALU ALU ALU Micro-op Queue 64K I-Cache 4 way Branch Prediction AGUAGU Load/Store Queues Integer Physical Register File 32K D-Cache 8 Way FP Register File Integer Rename Floating Point Rename Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler “ZEN” MICROARCHITECTURE ▪ Fetch Four x86 instructions ▪ Op Cache 2K instructions ▪ 4 Integer units ‒ Large rename space – 168 Registers ‒ 192 instructions in flight/8 wide retire ▪ 2 Load/Store units ‒ 72 Out-of-Order Loads supported ‒ Smart prefetch ▪ 2 Floating Point units x 128 FMACs ‒ built as 4 pipes, 2 Fadd, 2 Fmul ‒ 2 AES units ‒ SHA instruction support ▪ I-Cache 64K, 4-way ▪ D-Cache 32K, 8-way ▪ L2 Cache 512K, 8-way ▪ Large shared L3 cache ▪ 2 threads per core
  5. 5. 5 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE INTEGER FLOATING POINT Integer Physical Register File FP Register File Decode 512K L2 (I+D) Cache 8 Way ADD MUL ADDMUL4x ALUs Micro-op Cache Micro-op Queue 64K I-Cache 4 way Branch Prediction Integer Rename Floating Point Rename Schedulers Scheduler L0/L1/L2 ITLB Retire Queue 2x AGUs Load Queue Store Queue 32K D-Cache 8 Way L1/L2 DTLBLoad Queue Store Queue 512K L2 (I+D) Cache 8 Way MUL ADD 32K D-Cache 8 Way L1/L2 DTLB Integer Physical Register File Integer Rename Schedulers FP Register File Floating Point Rename Scheduler 64K I-Cache 4 way Decode instructions ADDMUL4x ALUs Vertically Threaded Op-Cache micro-ops Micro-op Queue Branch Prediction L0/L1/L2 ITLB Competitively shared structures Statically Partitioned Competitively shared with Algorithmic Priority Competitively shared and SMT Tagged Retire Queue 6 ops dispatched 2x AGUs SMT OVERVIEW ▪ All structures fully available in 1T mode ▪ Front End Queues are round robin with priority overrides ▪ Increased throughput from SMT for data center workloads
  6. 6. 6 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE 32B/cycle 32B fetch 32B/cycle CORE 0 32B/cycle 32B/cycle 2*16B load 1*16B store 8M L3 I+D Cache 16-way 512K L2 I+D Cache 8-way 32K D-Cache 8-way 64K I-Cache 4-way “ZEN” CACHE HIERARCHY ▪ Fast private L2 cache, 12 cycles ▪ Fast shared L3 cache, 35 cycles ▪ L3 filled with victims of the L2 ▪ L2 tags duplicated in L3 for probe filtering and fast cache transfer ▪ Multiple smart prefetchers ▪ 50 outstanding misses supported from L2 to L3 per core ▪ 96 outstanding misses supported from L3 to memory HIGH BANDWIDTH, LARGE QUEUES FOR DATA CENTER
  7. 7. 7 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CPU COMPLEX ▪ A CPU complex (CCX) is four cores connected to an L3 Cache ▪ The L3 Cache is 16-way associative, 8MB, mostly exclusive of L2 ▪ The L3 Cache is made of 4 slices, by low-order address interleave ▪ Every core can access every cache with same average latency ▪ Utilize upper level metals for high frequency data transfer ▪ Multiple CCXs instantiated at SoC level CORE 3 CORE 1L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 3L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 0 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 2 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB
  8. 8. 8 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Resource Skylake-SP “Bulldozer” Family “Zen” Uops delivered 6 4 8 Issue Width 8 7 (4+3) 10(6+4) INT Reg 180 112 168 INT Sched 97 48 84 FP Reg 168 176 160 ROB 224 128 192 FADD,FMUL,FMA 4/4/4 5/5/5 3/4/5 FP Width 512 128 128 HIGH LEVEL COMPETITIVE ANALYSIS Resource Skylake-SP “Bulldozer" Family “Zen” LD Q 72 48 72 ST Q 56 32 44 Micro-op Cache 1.5K 0 2K L1 Icache 32K 96K 64K L1 Dcache 32K 32K 32K L2 Cache 1M 2M 512K L3 Cache/per core 1.33M 2M 2M L2 TLB Latency 7 25 7 L2 Latency 14 19 12 L3 Latency 50+ 60 35 CORE COMPARISONCACHE COMPARISON
  9. 9. 9 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE “ZEN” Totally New High-performance Core Design New High-Bandwidth, Low Latency Cache System Designed from the ground up for optimal balance of performance and power for the datacenter Simultaneous Multithreading (SMT) for High Throughput Energy-efficient FinFET Design for Enterprise-class Products ZEN DELIVERS GREATER THAN (vs Prior AMD processors) 52% IPC*
  10. 10. 10 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE VIRTUALIZATION ISA AND MICROARCHITECTURE ENHANCEMENTS ISA FEATURE NOTES “Bulldozer” “Zen” Data Poisoning Better handling of memory errors ✓ AVIC Advanced Virtual Interrupt Controller ✓ Nested Virtualization Nested VMLOAD, VMSAVE, STGI, CLGI ✓ Secure Memory Encryption (SME) Encrypted memory support ✓ Secure Encrypted Virtualization (SEV) Encrypted guest support ✓ PTE Coalescing Combines 4K page tables into 32K page size ✓ “Zen” Core built for the Data Center Microarchitecture Features Reduction in World Switch latency Dual Translation Table engines for TLBs Tightly coupled L2/L3 cache for scale-out Private L2: 12 cycles Shared L3: 35 cycles Cache + Memory topology reporting for Hypervisor/OS scheduling
  11. 11. 11 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE HARDWARE MEMORY ENCRYPTION SECURE MEMORY ENCRYPTION (SME) ▪ Protects against physical memory attacks ▪ Single key is used for encryption of system memory ‒ Can be used on systems with Virtual Machines (VMs) or Containers ▪ OS/Hypervisor chooses pages to encrypt via page tables or enabled transparently ▪ Support for hardware devices (network, storage, graphics cards) to access encrypted pages seamlessly through DMA ▪ No application modifications required Defense Against Unauthorized Access to Memory DRAM AES-128 Engine OS/Hypervisor VM Sandbox / VM Key Container Applications
  12. 12. 12 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE HARDWARE MEMORY ENCRYPTION SECURE ENCRYPTED VIRTUALIZATION (SEV) ▪ Protects VMs/Containers from each other, administrator tampering, and untrusted Hypervisor ▪ One key for Hypervisor and one key per VM, groups of VMs, or VM/Sandbox with multiple containers ▪ Cryptographically isolates the hypervisor from the guest VMs ▪ Integrates with existing AMD-V technology ▪ System can also run unsecure VMs ▪ No application modifications required DRAM AES-128 Engine OS/Hypervisor VM Sandbox / VM Key Key … … Container Key ApplicationsApplicationsApplications
  13. 13. 13 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE BUILT FOR SERVER: RAS (RELIABILITY, AVAILABILITY, SERVICEABILITY) DRAM Uncorrectable Error Core A Poison Cache DRAM Normal Access Good Core B (A) (B) Uncorrectable Error limited to consuming process (A) vs (B) Data Poisoning Enterprise RAS Feature Set ▪ Caches ‒ L1 data cache SEC-DED ECC ‒ L2+L3 caches with DEC-TED ECC ▪ DRAM ‒ DRAM ECC with x4 DRAM device failure correction ‒ DRAM Address/Command parity, write CRC—with replay ‒ Patrol and demand scrubbing ‒ DDR4 post package repair ▪ Data Poisoning and Machine Check Recovery ▪ Platform First Error Handling ▪ Infinity Fabric link packet CRC with retry
  14. 14. 14 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE MCM VS. MONOLITHIC ▪ MCM approach has many advantages ‒ Higher yield, enables increased feature-set ‒ Multi-product leverage ▪ EPYC Processors ‒ 4x 213mm2 die/package = 852mm2 Si/package* ▪ Hypothetical EPYC Monolithic ‒ ~777mm2* ‒ Remove die-to-die Infinity Fabric On-Package (IFOP) PHYs and logic (4/die), duplicated logic, etc. ‒ 852mm2 / 777mm2 = ~10% MCM area overhead * Die sizes as used for Gross-Die-Per-Wafer MCM Delivers Higher Yields, Increases Peak Compute, Maximizes Platform Value ▪ Inverse-exponential reduction in yield with die size ‒ Multiple small dies has inherent yield advantage I/ODDR CCX CCX Die 1 I/O I/ODDR CCX CCX Die 3 I/O I/ODDR CCX CCX Die 2 I/O I/ODDR CCX CCX Die 0 I/O CCX CCX CCX CCX CCX CCX CCX CCX I/O DDR I/O I/O I/O DDR DDRDDR I/O I/O I/O I/O 32C Die Cost 1.0X 32C Die Cost 0.59X1
  15. 15. 15 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CHIP FLOORPLANNING FOR PACKAGE ▪ DDR placement on one die edge ▪ Chips in 4-die MCM rotated 180°, DDR facing package left/right edges ▪ Package-top Infinity Fabric pinout requires diagonal placement of IFIS ▪ 4th IFOP enables routing of high-speed I/O in only four package substrate layers IFIS/PCIe IFOP IFOP Zen Zen Zen Zen L3 Zen Zen Zen Zen L3 IFOP IFIS/PCIe/SATAIFOP DDR DDR I/O CCX CCX DDR I/O
  16. 16. 16 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ADVANCED PACKAGING ▪ 58mm x 75mm organic package ▪ 4 die-to-die IFOP links/die ‒ 3 connected/die ▪ 2 SERDES/die to pins ‒ 1/die to “top” (G) and “bottom” (P) links ‒ Balanced I/O for 1P, 2P systems ▪ 2 DRAM-channels/die to pins Purpose-built MCM Architecture G[0-3], P[0-3] : 16 lane high-speed SERDES Links M[A-H] : 8 x72b DRAM channels : Die-to-Die Infinity Fabric On-Package links CCX: 4Core + L2/core + shared L3 complex P1 P0 P2 P3 G2 G3 G1 G0 MF MG MH ME Die-to-Package Topology Mapping MA MD MC MB I/ODDR CCX CCX Die 1 I/O I/ODDR CCX CCX Die 3 I/O I/ODDR CCX CCX Die 2 I/O I/ODDR CCX CCX Die 0 I/O
  17. 17. 17 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE CHIP ARCHITECTURE Infinity Fabric Scalable Data Fabric plane IFIS/PCIe® IFIS/PCIe/SATA IFOP IF SCF SMU CAKE CAKE CAKE PCIe Southbridge IO Complex SATA PCIe CCM CCX 4 cores + L3 CAKE CAKE IFOP CCX 4 cores + L3 CCM IFOP DDRDDRIFOP IOMS CAKE UMC UMC
  18. 18. 18 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE INFINITY FABRIC: MEMORY SYSTEM Optimized Memory Bandwidth and Capacity per Socket ▪ 8 DDR4 channels per socket ▪ Up to 2 DIMMs per channel, 16 DIMMs per socket ▪ Up to 2667MT/s, 21.3GB/s, 171GB/s per socket ‒ Achieves 145GB/s per socket, 85% efficiency* ▪ Up to 2TB capacity per socket (8Gb DRAMs) ▪ Supported memory ‒ RDIMM ‒ LRDIMM ‒ 3DS DIMM ‒ NVDIMM-N
  19. 19. 19 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE I/O SUBSYSTEM x16 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x Combo Link Type “B” Capabi x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 x1 x1 x1 Restrictions: 1) Ports from a single PHY must be of the same type (one of xGMI, PCIe, SATA). x1 Combo Link Type “A” Capabilities x16x16 PHY A0 PHY A1 PHY A2 PHY A3 PHY A4 PHY B2 PHY B3 PHY B4 xGM x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 Combo Link Type “B” Capabilities x16 x8 x4 x4 x2 x2 x8 x4 x4 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x1 x1 x1 x1 x1 x1 x1 Restrictions: 1) Ports from a single PHY must be of the same type (one of xGMI, PCIe, SATA). 2) There are a maximum of 8 PCIe ports in any 16-lane combo link. 3) Ports from PHY A0 and PHY A1 must all be of the same type (one of xGMI, PCIe, SATA). x1 Combo Link Type “A” Capabilities x16x16 PHY A0 PHY A1 PHY A2 PHY A3 PHY A4 PHY B0 PHY B1 PHY B2 PHY B3 PHY B4 PCIe Port SATA Port xGMI Link Architected for Massive I/O Systems with Leading-edge Feature Set, Platform Flexibility ▪ 8 x16 links available in 1 and 2 socket systems ‒ Link bifurcation support; max of 8 PCIe® devices per x16 ‒ Socket-to-Socket Infinity Fabric (IFIS), PCIe, and SATA support ▪ 32GB/s bi-dir bandwidth per link, 256GB/s per socket ▪ PCIe Features* ‒ Advanced Error Reporting (AER) ‒ Enhanced Downstream Port Containment (eDPC) ‒ Non-Transparent Bridging (NTB) ‒ Separate RefClk with Independent Spread support (SRIS) ‒ NVMe and Hot Plug support ‒ Peer-to-peer support ▪ Integrated SATA support * = See PCI-SIG for details PCIe IFIS Infinity Fabric PCIe IFIS Infinity Fabric SATA High Speed Link (SERDES) Capabilities
  20. 20. 20 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE EPYC™—1P I/O PERFORMANCE ▪ High bandwidth, high efficiency PCIe® ‒ Local DMA, Peer-to-Peer (P2P) High Performance I/O and Storage Achieved 1x16 PCIe Bandwidth* (Efficiency vs 16GB/s per direction) Local DRAM (GBs / Efficiency1) Die-to-Die Infinity Fabric (1-hop) (GBs / Efficiency1) DMA Read 12.2 / 76% 12.1 / 76% Write 13.0 / 81% 14.2 / 88% Read + Write 22.8 / 71% 22.6 / 70% P2P Write NA 13.6 / 85% 1: Peak efficiency of PCIe is ~85%-95% due to protocol overheads EPYC™ 7601 1P 16 x Samsung pm1725a NVMe, Ubuntu 17.04, FIO v2.16, 4KB 100% Read Results 9.2M IOPs* ▪ High performance storage ‒ ~285K IOPs/Core with 32 cores ‒ 1P direct-connect 24 drives x4 (no PCIe switches)
  21. 21. 21 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE POWER MANAGING I/O Dynamic Link Width Modulation ▪ The unprecedented connectivity that EPYC™ provides can consume additional power ▪ Full bandwidth is not required for all workloads ▪ Dynamic Link Width Modulation tracks the bandwidth demand of infinity fabric connections between sockets and adapts bandwidth to the need ▪ Up to 8% performance/Watt gains at the CPU socket* FABRIC 64 LANES HIGH SPEED I/O 64 LANES HIGH SPEED I/O *See Endnotes
  22. 22. 22 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE Power Performance POWER: PERFORMANCE-DETERMINISM, POWER-DETERMINISM, CONFIGURABLE TDP ▪ Server systems and environments vary ▪ Silicon varies ‒ Faster / higher leakage parts ‒ Slower / lower leakage parts ▪ Some customers demand repeatable, deterministic performance for all environments ‒ In this mode, power consumption will vary ▪ Others may want maximum performance with fixed power consumption ‒ In this mode, performance will vary ▪ EPYC™ CPUs allow boot-time choice of either mode 0.96 0.98 1 1.02 1.04 1.06 Performance Max  System and Silicon Margin  Min Power Determinism Mode 0 0.5 1 1.5 Power Perf Determinism Mode Normalized Performance and Power Max  System and Silicon Margin  Min Product TDP Low range High range 180W 165W 200W 155W 135W 170W 120W 105W 120W ▪ EPYC allows configurable TDP for TCO and peak performance vs performance/Watt optimization
  23. 23. 23 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE POWER: PER-CORE LINEAR VOLTAGE REGULATION ▪ Running all dies/cores at the voltage required by the slowest wastes power ▪ Per-core voltage regulator capabilities enable each core’s voltage to be optimized ‒ Significant core power savings and variation reduction Per-core LDO Controlled Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 VDD0 VDD1 RVDD (VRM output) Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15 VDD8 VDD9 Die0 Die1 Core16 Core17 Core18 Core19 Core20 Core21 Core22 Core23 VDD16 VDD17 Die2 Core24 Core25 Core26 Core27 Core28 Core29 Core33 Core31 VDD24 VDD31 Die3
  24. 24. 24 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE 0 70 140 210 280 350 TRIAD TRIAD STREAM Memory Bandwidth (EPYC 7601 1P -> 2P) 1P 2P DELIVERED CPU PERFORMANCE ▪ Memory bandwidth and scaling ‒ 97% Bandwidth scaling 1P->2P ‒ Bandwidth for Disruptive 1P and 2P performance ▪ INT and FP performance ‒ Memory bandwidth for demanding FP applications ‒ Competitive INT and FP performance ‒ Leadership TCO Optimizing Compiler (See endnotes for calculations) AMD EPYC 7601 64 CORES, 2.2 GHz HP DL385 Gen10 AOCC INTEL XEON 8176 56 CORES, 2.1 GHz HP DL380 Gen10 ICC EPYC Uplift 2P SPECrate®2017_int_base 272 254 +7% SPECrate®2017_fp_base 259 225 +15% Peak Mem BW 8x2666 6x2666 +33% MSRP $4200 $8719 Better Value EPYC Processors Deliver Compelling Performance, Memory Bandwidth, and Value GB/s
  25. 25. 25 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE AMD EPYC™ 7000 SERIES PROCESSOR Architected for Enterprise and Datacenter Server TCO Up to 32 High-Performance “Zen” Cores 8 DDR4 Channels per CPU Up to 2TB Memory per CPU 128 PCIe® Lanes Dedicated Security Subsystem Integrated Chipset for 1-Chip Solution Socket-Compatible with Next Gen EPYC Processors
  26. 26. 26 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE GLOSSARY OF TERMS “Bulldozer” = AMD’s previous family of high performance CPU cores CAKE = Coherent AMD SocKet Extender CCX = Tightly Coupled Core + Cache Complex 4Core + L2 cache/core + Shared L3 cache complex CLGI = Clear Global Interrupt DIMM = Dual Inline Memory Module ECC = Error Checking and Correcting SEC-DED = Single Error Correct, Double Error Detect DEC-TED = Double Error Correct, Triple Error Detect GPU = Graphics Processing Unit IPC = Instructions Per Clock IFIS = Infinity Fabric Inter-Socket IFOP = Infinity Fabric On-Package LDO = Low Drop Out, a linear voltage regulator MCM = Multi-Chip Module NUMA = Non-Uniform Memory Architecture OS = Operating System Post Package Repair = Utilize internal DRAM array redundancy to map out errant locations PSP = Platform Security Processor RAS = Reliability, Availability, Serviceability SATA = Serial ATA device connection SCH = Server Controller Hub SCM = Single Chip Module Scrubbing = Searching for, and proactively correcting, ECC errors SERDES = Serializer/Deserializer STGI = Set Global Interrupt TCO = Total Cost of Ownership TDP = Thermal Design Power Topology Reporting = Describing cache and memory layout via ACPI/Software structures for OS/Hypervisor task scheduling optimization VM = Virtual Machine VRM = Voltage Regulator Module World Switch = Switch between Guest Operating System and Hypervisor
  27. 27. 27 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ENDNOTES SLIDE 9 Updated Feb 28, 2017: Generational IPC uplift for the “Zen” architecture vs. “Piledriver” architecture is +52% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2 at a fixed 3.4GHz. Generational IPC uplift for the “Zen” architecture vs. “Excavator” architecture is +64% as measured with Cinebench R15 1T, and also +64% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2, at a fixed 3.4GHz. System configs: AMD reference motherboard(s), AMD Radeon™ R9 290X GPU, 8GB DDR4-2667 (“Zen”)/8GB DDR3-2133 (“Excavator”)/8GB DDR3-1866 (“Piledriver”), Ubuntu Linux 16.x (SPECint_base2006 estimate) and Windows® 10 x64 RS1 (Cinebench R15). SPECint_base2006 estimates: “Zen” vs. “Piledriver” (31.5 vs. 20.7 | +52%), “Zen” vs. “Excavator” (31.5 vs. 19.2 | +64%). Cinebench R15 1t scores: “Zen” vs. “Piledriver” (139 vs. 79 both at 3.4G | +76%), “Zen” vs. “Excavator” (160 vs. 97.5 both at 4.0G| +64%). GD-108 SLIDE 14 Based on AMD internal yield model using historical defect density data for mature technologies. SLIDE 18 In AMD internal testing on STREAM Triad, 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290. NAP-22 SLIDE 20 In AMD internal testing on IO Bandwidth benchmarks. 2x EPYC 7601 in AMD “Ethanol” reference systems, BIOS “TDL0080F” (default settings), Ubuntu 15.04.2, 512GB (16 x 32GB 2Rx4 PC4-2666) memory. Read and Write tests used AMD FirePro W7100 GPUs, 256B payload size, Driver “amdgpu-pro version 17.10-450821”. Read + Write tests used Mellanox IB 100Gbps Adapter, 512B payload size, Driver “OFED 3.3-1.04”. Read and Write DMA tests performance using AMD SST (System Stress Tool). Write peer-to-peer (P2P) tests performance using AMD P2P (Peer to Peer) tool. Read + Write DMA tests performance using OFED qperf tool. 1 x EPYC 7601 CPU in HPE Cloudline CL3150, Ubuntu 17.04 4.10 kernel (Scheduler changed to NOOP, CPU governor set to performance), 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 24 x Samsung pm1725a NVMe drives (with only 16 enabled): FIO v2.16 (4 Jobs per drive, IO Depth of 32, 4K block size) Average Read IOPs 9,178,000 on 100% Read Test (Average BW 35.85 GB/s); FIO (4 jobs per drive, IO depth of 10, 4K block size) Average Write IOPs 7,111,000 on 100% Write Test (Average BW 27.78 GB/s Each run was done for 30 seconds with a 10 second ramp up using 16 NVMe drives. NAP-24 SLIDE 21 EPYC 7601 DVT part on AMD Ethanol 2P system, 32GB x16 (512GB) DR-2667 full pop 1of1, 1 500GB SSD, 1 1GBit Nic, Ubuntu 16.04 noGUI. Feature off gets an estimated score of 1297, feature on gets an estimated score of 1321. Selectable power used to increase from 180W to 195W with feature-off yields the same feature-on estimated score of 1321 indicating an effective 8% perf/Watt improvement at the socket level for feature-on (195W/180W)
  28. 28. 28 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE ENDNOTES SLIDE 24 MSRP for AMD EPYC 7601 of $4200 and for Intel Platinum 8176 of $8719 based on Anandtech https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/10 AMD EPYC 7601 1-socket to 2-socket memory bandwidth scaling achieves up to 1.97x. In AMD internal testing on STREAM Triad, 1 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 147; 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290 AMD EPYC 7601 CPU-based 2-socket system scored 272 on SPECrate®2017_int_base, 7% higher than an Intel Platinum 8176-based 2-socket system which scored 254, based on publications on www.spec.org as of March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 272 on SPECrate®2017_int_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380 Gen10 score of 254 published October 2017 at www.spec.org. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information. https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00833.pdf https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00078.pdf AMD EPYC 7601 CPU-based 2-socket system scored 259 on SPECrate®2017_fp_base, 15% higher than an Intel Platinum 8176-based 2-socket system which scored 225, based on publications on www.spec.org as of March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 259 on SPECrate®2017_fp_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380 Gen10 score of 225 published October 2017 at www.spec.org. SPEC and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information. https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00845.pdf https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00077.pdf
  29. 29. 29 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2018 Advanced Micro Devices, Inc. All rights reserved. AMD, EPYC, Radeon, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

×