Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core

3,487 views

Published on

T. Singh, S. Rangarajan, D. John, R. Schreiber, S. Oliver, R. Seahra, A. Schaefer Presented at ISSCC 2020

Published in: Technology
  • Be the first to comment

Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core

  1. 1. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 1 of 33 Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core T. Singh1, S. Rangarajan1, D. John1, R. Schreiber1, S. Oliver1, R. Seahra2, A. Schaefer1 1AMD, Austin, TX, 2AMD, Markham, ON, Canada Presented at ISSCC 2020
  2. 2. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 2 of 33 Outline • Motivation • Market Segments • Architecture • Core Complex • Technology • Implementation • SRAMs • Power • Silicon Results • Conclusion
  3. 3. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 3 of 33 Motivation • Zen was a huge lift • Zen2 compelling successor to Zen • Goals – Give above industry trend generational performance improvement – Enable 2x cores same socket – Improve single thread (1T) performance • How can we do this? – Technology port – Architectural changes – Physical design and methodology changes • AMD was aggressive and we did all of the above to achieve the goals!!
  4. 4. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 4 of 33 Zen 2 Market Segments
  5. 5. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 5 of 33 Zen 2 Architecture • Changes from Zen – New TAGE Branch Predictor – Optimized L1 Instruction Cache: 32K/8-way vs. 64K/4-way – 2X Op Cache Capacity: 4K vs. 2K ops – 2X Floating Point Data Path Width: 256b vs 128b – 3rd Address Generation Unit – Larger Physical Structures: Integer Scheduler, PRF, ROB, Store Queue, L2DTLB – 2X L1 Data Cache Read/Write Bandwidth – 2X L3 Cache: 16MB vs. 8MB per Core Complex (CCX) • +15%1 single thread (1T) IPC over Zen • ~9% switching capacitance (CAC) improvement over previous generation, technology neutral 1 AMD "Zen 2" CPU-based system scored an estimated 15% higher than previous generation AMD “Zen” based system using estimated SPECint®_base2006 results. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org.
  6. 6. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 6 of 33 Core Functional Units • 32KB IC • 32KB DC • ~20 blocks, ~400K avg instances • ROM for uCODE • 5 L1 RAM variants • Chip Pervasive Logic (CPL) – clock/test block Floating Point Data Cache Load/ Store ALU Scheduler Branch Prediction I-CacheDecode L2 Cache uCode CPL
  7. 7. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 7 of 33 L2/L3 Cache Hierarchy • Only 3 unique custom macros – Down from 8 on Zen • Each 4M slice is identical • Multi-stage clock gating in L3 to keep clock distribution power the same as 8M L3 from Zen • LDOs incorporated into the L3 to supply VDDM to L2 and L3 arrays – Loss of package distribution of VDDM meant LDOs had to be moved closer – Must reduce current on VDDM CTLL3Tags L3Data 4M Slice L2 Data L2 Tags L2 Status Shadow tag macros for serving external probes 512K L2 LDOs
  8. 8. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 8 of 33 Zen 2 Core Complex (CCX) • 4 core complex • L3 size increases to 16MB • Design for flexibility • Maximize # cores for server case
  9. 9. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 9 of 33 Zen 2 CCX Configs HEDT/Server 4 Core, 16MB L3 CCX APU 4 Core, 4MB L3 CCX Value 2 Core, 4MB L3 CCX • Zen 2 Core can be used in various configs covering a wide power range • Multiple CCX can be placed to achieved desired core count Cores Market TDP 8 Notebook 15W 6 Desktop 65 W 8 Desktop/Server 65-120 W 12 Desktop/Server 105-120 W 16 Desktop/Server 105-155 W 24 HEDT/Server 155-280 W 32 HEDT/Server 155-280 W 48 Server 200-225 W 64 HEDT/Server 200-280 W
  10. 10. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 10 of 33 Zen vs. Zen 2 Technology Comparison Zen Zen 2 Tech 14nm FinFET 7nm FinFET Cores/CCX 4 Cores, 8 Threads 4 Cores, 8 Threads Area/CCX 44 mm2 31.3 mm2 L2/core 512KB 512KB L3/CCX 8MB 16MB CPP 78 nm 57 nm Fin Pitch 48 nm 30 nm 1x Metal Pitch 64 nm 57 nm Stdcell Track Library 10.5 track 6 track Cu Metal Layers 11 w/ MiM 13 w/ MiM
  11. 11. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 11 of 33 Zen vs. Zen 2 Technology Comparison (cont) Zen (14nm) Zen 2 (7nm) Layer Name Pitch Layer Name Pitch M0 StdCell Internal n/a M0 StdCell Internal 1.0x M1 StdCell Internal 1.0x M1 Stdcell & BEOL 1.425x M2-M3 1.0x M2-M3 1.0x-1.1x M4-M7 1.25x M4-M7 2.0x M8-M9 2.0x M8-M9 2.0x --- --- M10-M11 3.15x M10-M11 (RDL) 11.25x M12-M13 (RDL) 18.0x
  12. 12. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 12 of 33 Place and Route Design Optimization • 7nm FinFET presents unique route challenges – Lower layer jogs forbidden – Denser standard cells with reduction in track height – Increased lower level metal resistance • Deep collaboration between AMD CAD, foundry, and EDA partners – Cell density management – Advanced legalization techniques – Improved pre-route timing estimates – Wire Engineering and Via Ladders Same-Layer Jogs Forbidden Inter-Layer Jumpers Required
  13. 13. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 13 of 33 Placement Restricted by Large Cells • Multi-row cells benefit power and area, but create placement challenges • Clustering of flops has many benefits but can cause placement issues • Resulting small gaps are challenging to use and required innovation to exploit – New algorithms – Flexible power grid choices
  14. 14. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 14 of 33 Design RC Miscorrelation • Pre-route vs Post-route miscorrelation caused by length and layer assumptions • Pre-route miscorrelations for resistance and capacitance have differing root causes – Layer assignment for resistance – Length estimates for resistance and capacitance • Based on previously modeled trends, EDA tools may have challenges estimating delay • Required innovation to tackle Layer Normalized Resistance Normalized Capacitance M1 1.00 1.00 M2 3.17 0.96 M3 2.31 0.96 M4 0.72 0.75 M5 0.55 0.83 M6 0.52 0.83 M7 0.55 0.83 M8 0.52 0.83 M9 0.55 0.92 M10 0.16 0.96 M11 0.16 0.92
  15. 15. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 15 of 33 Pre-Route Correlation Improvements • Plots showing ClockTreeSynthesis vs Route timing • Large variance in initial results – Large number of paths have overly- pessimistic delay during pre-route steps. Tools waste resources trying to fix – Significant number of paths have optimistic delay estimates. These paths are under- optimized • Employed timing with targeted capacitance scaling and global route- based layer estimation – Standard deviation dramatically improved while keeping a slightly pessimistic mean cts_vs_route.slack.corr cts_vs_route.slack_delta.hist cts_vs_route.slack.corr cts_vs_route.slack_delta.hist Timing Slack Correlation Timing Slack Delta Initial Results Improved Results Pessimistic Optimistic
  16. 16. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 16 of 33 Wire Engineering Challenges • Lower layers getting more resistive with latest technology nodes – Very short routes in tight data paths need a buffer – Routes longer than Steiner due to complex rules – Challenging for optimization tools to comprehend • Critical signals need to get to higher layers quickly
  17. 17. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 17 of 33 Wire Engineering and Via Ladders • Team used selective layer optimization, buffering, pre-routes, and via ladders to exploit the fast layers for critical signals • Two types of via ladders – High Performance: for large buffers driving long wires – EM: for high-activity gates (e.g., clock drivers) – Mitigated EM issues on large fanout nodes with high activity Top Via Ladder View Side Via Ladder View
  18. 18. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 18 of 33 L2/L3 Cache Changes • Zen had an on-die LDO to generate VDDM supply for use by cache arrays • Zen 2’s package choices make using package layers for VDDM distribution impossible • Moved the bitline precharge from VDDM to VDD to reduce current VDDM VDDM BLT[] BLC[] WRCS[] RDCSX[] SAPCX SAEN WDT_X WDC_X XCENX SAT SAC SAC_INT SAT_INT BLPCX WL[N:0] BLT[] BLC[] WRCS[] RDCSX[] SAPCX SAEN XCENX SAT SAC SAT SAC BLPCX WL[N:0] NegBL Write DriverWDT_X WDC_X
  19. 19. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 19 of 33 • Moving bitline precharge to VDD creates both bitcell stability and writeability challenges • High level of configurability allows for silicon flexibility VDD Precharge Challenges SRAMSRAMSRAMSRAMs WLUdEn NegBlEn Assist configurations Fuses Assist controller System Management Programming details superVminEn superVmaxEn Voltage thresholds superVmaxEn=1superVminEn=1 VDDM VDDmax VDDmin VDD Controller pauses voltage increase and unsets superVminEn register before continuing to raise voltage Controller pauses voltage increase and sets superVmaxEn register before continuing to raise voltage VDD where VDDM-VDD=superVminThreshold VDD where VDD-VDDM=superVmaxThreshold
  20. 20. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 20 of 33 • Moving precharge to VDD reduced our current enough to allow on die-distribution but presents other challenges • Read before write timing challenges at low VDD, high VDDM VDD Precharge Timing Challenges WL@ constant VDDM BLPCX @ high VDD BLPCX @ low VDD WL on before Bitline precharge turns off at low VDD! Bitline precharge turns on before WL turns off at high VDD! Power races with WL Read before write challenges
  21. 21. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 21 of 33 • Solving these multiple voltage timing challenges required a number of techniques – Dual voltage clock shapers to average two voltage domains • Can alter the number of these buffers on VDD or VDDM or remove them entirely to make timing more or less dependent on either supply – False read before write problem can be mitigated by compressing the front end of the WL during a write operation Solving Timing Challenges ISOX@VDDM Input@VDD shapedFallInput @VDD VDDM LS LS VDD Psuedo-dynamic level shifter WREN WLCLK WLCLK_shape WLCLK WL during read WL during write
  22. 22. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 22 of 33 CAC Comparison • 3% decrease in flop power allocates more budget for combinational logic
  23. 23. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 23 of 33 FLOP Palette Improvements • Rich flop library, balance timing/power needs by driving right flop mix • Up to 8% Fmax benefit from high speed flops in timing critical loop paths Best for Performance Best for Power
  24. 24. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 24 of 33 Low Power Gater Latch Energy with AvgApp Activity (fJ) State LP Latch Regular Latch Ratio E=1 0.22 0.18 121% E=0 0.17 1.61 10% Total 0.38 1.79 22% E TE CLKB CLKBB CLKBBCLKB CLKBB CLKB qf_x qf Q Dbar Dbar qf CLK • 90% Power savings in latch for common case of E = 0 through internal self gating • Clock gater latch power contribution from 22% in Zen to 13% in Zen 2 for an average application
  25. 25. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 25 of 33 Zen 2 Clock Optimization • Multi mesh plan for the core supported by configurable clock tree construction – FP level mesh gating enabled with minimal timing/area overhead – 15% Mesh power savings in Idle and Average App • Tight clock skew distribution • Relocated clock spines and technology shrink (vs. Zen) achieves similar skew profile while reducing CAC
  26. 26. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 26 of 33 Zen vs. Zen 2 CAC Comparison • Primary sources of CAC reduction – 14 nm to 7 nm scaling – 6 track library – Aggressive microarchitectural CAC optimizations
  27. 27. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 27 of 33 Generational Leadership Perf/Watt • Performance/Watt driven by a combination of technology and design improvements • Timing – Improved scalability by optimizing at a wider voltage range compared to Zen – Multi-corner optimization • Library choice and optimization – 6 track library enabled additional CAC/leakage savings in addition to default technology entitlement • Design CAC – MBFF, low power clock-gater library optimization – RTL improvements – CAC aware downsizing methodology Zen power @ 100% IPC Zen2 power @ 115% IPC 7nm CAC Savings 7nm Timing Design CAC Savings Library Choice Power Improvements – ISO Frequency
  28. 28. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 28 of 33 Frequency/Power Silicon Results • 4 cores active with 2 threads per core • The combined effect of lower Vmin for the same frequency and reduced CAC enabled a 50% reduction in power for a given frequency throughout most of the F(P) curve • This enables 2x cores in the same socket!! 50% power reduction
  29. 29. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 29 of 33 Frequency/Voltage Silicon Results • 1 core active with two threads per core, 3 cores idle • F/V curve improved over all voltages • Design worked to improve the low voltage performance for improved linearity • Wide voltage range
  30. 30. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 30 of 33 Conclusion • Met Goals – Moved to energy efficient TSMC 7nm finFET – Made huge architectural changes – Improved PD and methodology • Results are clear – Scalable across 15W mobile to 280W Server – 50% reduced power at iso-frequency – Enable 2x cores in same-socket – >15% 1T IPC over previous generation – ~9% CAC improvement over previous generation technology neutral – Enables peak frequencies up to 4.7GHz (+350MHz generationally) • Zen2 delivers generational performance uplift!!
  31. 31. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 31 of 33 Acknowledgements • We would like to thank our talented AMD design team across Austin, Fort Collins, Santa Clara, Boston, Markham, and India who contributed to Zen 2 • Please stay for our chiplet paper next
  32. 32. 2.2 : AMD Chiplet Architecture for High-Performance Server and Desktop Products © 2020 IEEE International Solid-State Circuits Conference 32 of 33 Disclaimer and Endnotes DISCLAIMER The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 All rights reserved. AMD, the AMD Arrow logo, EPYC, RYZEN, Threadripper, Radeon, Infinity Fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
  33. 33. 2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 33 of 33

×