2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
1 of 33
Zen 2: The AMD 7nm Energy-Efficient
High-Performance x86-64 Microprocessor Core
T. Singh1, S. Rangarajan1, D. John1, R. Schreiber1, S. Oliver1, R. Seahra2, A. Schaefer1
1AMD, Austin, TX, 2AMD, Markham, ON, Canada
Presented at ISSCC 2020
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
2 of 33
Outline
• Motivation
• Market Segments
• Architecture
• Core Complex
• Technology
• Implementation
• SRAMs
• Power
• Silicon Results
• Conclusion
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
3 of 33
Motivation
• Zen was a huge lift
• Zen2 compelling successor to Zen
• Goals
– Give above industry trend generational
performance improvement
– Enable 2x cores same socket
– Improve single thread (1T) performance
• How can we do this?
– Technology port
– Architectural changes
– Physical design and methodology changes
• AMD was aggressive and we did all of the
above to achieve the goals!!
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
4 of 33
Zen 2 Market Segments
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
5 of 33
Zen 2 Architecture
• Changes from Zen
– New TAGE Branch Predictor
– Optimized L1 Instruction Cache: 32K/8-way vs. 64K/4-way
– 2X Op Cache Capacity: 4K vs. 2K ops
– 2X Floating Point Data Path Width: 256b vs 128b
– 3rd Address Generation Unit
– Larger Physical Structures: Integer Scheduler, PRF, ROB, Store Queue, L2DTLB
– 2X L1 Data Cache Read/Write Bandwidth
– 2X L3 Cache: 16MB vs. 8MB per Core Complex (CCX)
• +15%1 single thread (1T) IPC over Zen
• ~9% switching capacitance (CAC) improvement over previous
generation, technology neutral
1 AMD "Zen 2" CPU-based system scored an estimated 15% higher than previous generation AMD “Zen” based system using estimated SPECint®_base2006 results.
SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org.
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
6 of 33
Core Functional Units
• 32KB IC
• 32KB DC
• ~20 blocks, ~400K
avg instances
• ROM for uCODE
• 5 L1 RAM variants
• Chip Pervasive Logic
(CPL) – clock/test
block
Floating
Point
Data
Cache
Load/
Store
ALU
Scheduler
Branch
Prediction
I-CacheDecode
L2
Cache
uCode CPL
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
7 of 33
L2/L3 Cache Hierarchy
• Only 3 unique custom
macros
– Down from 8 on Zen
• Each 4M slice is identical
• Multi-stage clock gating in
L3 to keep clock
distribution power the
same as 8M L3 from Zen
• LDOs incorporated into
the L3 to supply VDDM to
L2 and L3 arrays
– Loss of package distribution
of VDDM meant LDOs had
to be moved closer
– Must reduce current on
VDDM
CTLL3Tags
L3Data
4M Slice
L2 Data
L2 Tags
L2 Status
Shadow tag macros for serving external probes
512K L2
LDOs
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
8 of 33
Zen 2 Core Complex (CCX)
• 4 core complex
• L3 size increases to 16MB
• Design for flexibility
• Maximize # cores for server case
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
9 of 33
Zen 2 CCX Configs
HEDT/Server
4 Core,
16MB L3 CCX
APU
4 Core,
4MB L3 CCX
Value
2 Core,
4MB L3
CCX
• Zen 2 Core can be used in various
configs covering a wide power range
• Multiple CCX can be placed to
achieved desired core count
Cores Market TDP
8 Notebook 15W
6 Desktop 65 W
8 Desktop/Server 65-120 W
12 Desktop/Server 105-120 W
16 Desktop/Server 105-155 W
24 HEDT/Server 155-280 W
32 HEDT/Server 155-280 W
48 Server 200-225 W
64 HEDT/Server 200-280 W
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
10 of 33
Zen vs. Zen 2 Technology Comparison
Zen Zen 2
Tech 14nm FinFET 7nm FinFET
Cores/CCX
4 Cores,
8 Threads
4 Cores,
8 Threads
Area/CCX 44 mm2 31.3 mm2
L2/core 512KB 512KB
L3/CCX 8MB 16MB
CPP 78 nm 57 nm
Fin Pitch 48 nm 30 nm
1x Metal Pitch 64 nm 57 nm
Stdcell Track Library 10.5 track 6 track
Cu Metal Layers 11 w/ MiM 13 w/ MiM
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
11 of 33
Zen vs. Zen 2 Technology Comparison (cont)
Zen (14nm) Zen 2 (7nm)
Layer Name Pitch Layer Name Pitch
M0
StdCell Internal
n/a
M0
StdCell Internal
1.0x
M1
StdCell Internal
1.0x
M1
Stdcell
& BEOL
1.425x
M2-M3 1.0x M2-M3 1.0x-1.1x
M4-M7 1.25x M4-M7 2.0x
M8-M9 2.0x M8-M9 2.0x
--- --- M10-M11 3.15x
M10-M11
(RDL)
11.25x
M12-M13
(RDL)
18.0x
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
12 of 33
Place and Route Design Optimization
• 7nm FinFET presents unique route challenges
– Lower layer jogs forbidden
– Denser standard cells with reduction in track height
– Increased lower level metal resistance
• Deep collaboration between AMD CAD,
foundry, and EDA partners
– Cell density management
– Advanced legalization techniques
– Improved pre-route timing estimates
– Wire Engineering and Via Ladders
Same-Layer Jogs
Forbidden
Inter-Layer Jumpers
Required
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
13 of 33
Placement Restricted by Large Cells
• Multi-row cells benefit
power and area, but
create placement
challenges
• Clustering of flops has
many benefits but can
cause placement
issues
• Resulting small gaps
are challenging to use
and required innovation
to exploit
– New algorithms
– Flexible power grid
choices
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
14 of 33
Design RC Miscorrelation
• Pre-route vs Post-route miscorrelation caused
by length and layer assumptions
• Pre-route miscorrelations for resistance and
capacitance have differing root causes
– Layer assignment for resistance
– Length estimates for resistance and capacitance
• Based on previously modeled trends, EDA
tools may have challenges estimating delay
• Required innovation to tackle
Layer
Normalized
Resistance
Normalized
Capacitance
M1 1.00 1.00
M2 3.17 0.96
M3 2.31 0.96
M4 0.72 0.75
M5 0.55 0.83
M6 0.52 0.83
M7 0.55 0.83
M8 0.52 0.83
M9 0.55 0.92
M10 0.16 0.96
M11 0.16 0.92
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
15 of 33
Pre-Route Correlation Improvements
• Plots showing ClockTreeSynthesis vs
Route timing
• Large variance in initial results
– Large number of paths have overly-
pessimistic delay during pre-route steps.
Tools waste resources trying to fix
– Significant number of paths have optimistic
delay estimates. These paths are under-
optimized
• Employed timing with targeted
capacitance scaling and global route-
based layer estimation
– Standard deviation dramatically improved
while keeping a slightly pessimistic mean cts_vs_route.slack.corr cts_vs_route.slack_delta.hist
cts_vs_route.slack.corr cts_vs_route.slack_delta.hist
Timing Slack Correlation Timing Slack Delta
Initial
Results
Improved
Results
Pessimistic Optimistic
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
16 of 33
Wire Engineering Challenges
• Lower layers getting more
resistive with latest
technology nodes
– Very short routes in tight data
paths need a buffer
– Routes longer than Steiner due
to complex rules
– Challenging for optimization
tools to comprehend
• Critical signals need to get to
higher layers quickly
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
17 of 33
Wire Engineering and Via Ladders
• Team used selective layer optimization,
buffering, pre-routes, and via ladders to
exploit the fast layers for critical signals
• Two types of via ladders
– High Performance: for large buffers driving long
wires
– EM: for high-activity gates (e.g., clock drivers)
– Mitigated EM issues on large fanout nodes with
high activity
Top Via Ladder View
Side Via Ladder View
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
18 of 33
L2/L3 Cache Changes
• Zen had an on-die LDO
to generate VDDM
supply for use by cache
arrays
• Zen 2’s package choices
make using package
layers for VDDM
distribution impossible
• Moved the bitline
precharge from VDDM to
VDD to reduce current
VDDM VDDM
BLT[]
BLC[]
WRCS[]
RDCSX[]
SAPCX
SAEN
WDT_X
WDC_X
XCENX
SAT
SAC
SAC_INT SAT_INT
BLPCX
WL[N:0]
BLT[]
BLC[]
WRCS[]
RDCSX[]
SAPCX
SAEN
XCENX
SAT
SAC
SAT SAC
BLPCX
WL[N:0]
NegBL Write DriverWDT_X
WDC_X
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
19 of 33
• Moving bitline precharge
to VDD creates both
bitcell stability and
writeability challenges
• High level of
configurability allows for
silicon flexibility
VDD Precharge Challenges
SRAMSRAMSRAMSRAMs
WLUdEn
NegBlEn
Assist configurations
Fuses
Assist controller
System
Management
Programming
details
superVminEn
superVmaxEn
Voltage
thresholds
superVmaxEn=1superVminEn=1
VDDM
VDDmax
VDDmin
VDD
Controller pauses voltage
increase and unsets
superVminEn register before
continuing to raise voltage
Controller pauses voltage
increase and sets
superVmaxEn register before
continuing to raise voltage
VDD where VDDM-VDD=superVminThreshold
VDD where VDD-VDDM=superVmaxThreshold
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
20 of 33
• Moving precharge to VDD reduced our current enough to allow on die-distribution but presents other
challenges
• Read before write timing challenges at low VDD, high VDDM
VDD Precharge Timing Challenges
WL@ constant
VDDM
BLPCX @ high
VDD
BLPCX @ low
VDD
WL on before
Bitline precharge
turns off at low
VDD!
Bitline precharge
turns on before WL
turns off at high
VDD!
Power races with WL Read before write challenges
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
21 of 33
• Solving these multiple voltage timing challenges required a number of techniques
– Dual voltage clock shapers to average two voltage domains
• Can alter the number of these buffers on VDD or VDDM or remove them entirely to make timing more
or less dependent on either supply
– False read before write problem can be mitigated by compressing the front end of the WL
during a write operation
Solving Timing Challenges
ISOX@VDDM
Input@VDD
shapedFallInput
@VDD
VDDM
LS
LS
VDD
Psuedo-dynamic level shifter
WREN
WLCLK WLCLK_shape
WLCLK
WL during read
WL during write
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
22 of 33
CAC Comparison
• 3% decrease in flop power allocates more budget for combinational logic
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
23 of 33
FLOP Palette Improvements
• Rich flop library,
balance
timing/power
needs by
driving right flop
mix
• Up to 8% Fmax
benefit from
high speed
flops in timing
critical loop
paths
Best for Performance Best for Power
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
24 of 33
Low Power Gater Latch
Energy with AvgApp Activity (fJ)
State
LP
Latch
Regular
Latch
Ratio
E=1 0.22 0.18 121%
E=0 0.17 1.61 10%
Total 0.38 1.79 22%
E
TE
CLKB
CLKBB
CLKBBCLKB
CLKBB
CLKB
qf_x qf
Q
Dbar
Dbar
qf CLK
• 90% Power savings in latch for common case of E = 0
through internal self gating
• Clock gater latch power contribution from 22% in Zen to 13%
in Zen 2 for an average application
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
25 of 33
Zen 2 Clock Optimization
• Multi mesh plan for the
core supported by
configurable clock tree
construction
– FP level mesh gating
enabled with minimal
timing/area overhead
– 15% Mesh power savings
in Idle and Average App
• Tight clock skew distribution
• Relocated clock spines and
technology shrink (vs. Zen)
achieves similar skew profile
while reducing CAC
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
26 of 33
Zen vs. Zen 2 CAC Comparison
• Primary sources of CAC reduction
– 14 nm to 7 nm scaling
– 6 track library
– Aggressive microarchitectural CAC optimizations
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
27 of 33
Generational Leadership Perf/Watt
• Performance/Watt driven by a combination
of technology and design improvements
• Timing
– Improved scalability by optimizing at a wider
voltage range compared to Zen
– Multi-corner optimization
• Library choice and optimization
– 6 track library enabled additional
CAC/leakage savings in addition to default
technology entitlement
• Design CAC
– MBFF, low power clock-gater library
optimization
– RTL improvements
– CAC aware downsizing methodology
Zen power
@ 100% IPC
Zen2 power
@ 115% IPC
7nm CAC Savings
7nm Timing
Design CAC
Savings
Library Choice
Power Improvements – ISO Frequency
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
28 of 33
Frequency/Power Silicon Results
• 4 cores active with 2
threads per core
• The combined effect of
lower Vmin for the same
frequency and reduced
CAC enabled a 50%
reduction in power for a
given frequency
throughout most of the
F(P) curve
• This enables 2x cores in
the same socket!!
50% power
reduction
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
29 of 33
Frequency/Voltage Silicon Results
• 1 core active with
two threads per
core, 3 cores idle
• F/V curve improved
over all voltages
• Design worked to
improve the low
voltage
performance for
improved linearity
• Wide voltage range
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
30 of 33
Conclusion
• Met Goals
– Moved to energy efficient TSMC 7nm finFET
– Made huge architectural changes
– Improved PD and methodology
• Results are clear
– Scalable across 15W mobile to 280W Server
– 50% reduced power at iso-frequency
– Enable 2x cores in same-socket
– >15% 1T IPC over previous generation
– ~9% CAC improvement over previous
generation technology neutral
– Enables peak frequencies up to 4.7GHz
(+350MHz generationally)
• Zen2 delivers generational performance
uplift!!
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
31 of 33
Acknowledgements
• We would like to thank our talented AMD design team across
Austin, Fort Collins, Santa Clara, Boston, Markham, and India
who contributed to Zen 2
• Please stay for our chiplet paper next
2.2 : AMD Chiplet Architecture for High-Performance Server and Desktop Products
© 2020 IEEE
International Solid-State Circuits Conference 32 of 33
Disclaimer and Endnotes
DISCLAIMER
The information contained herein is for informational purposes only, and is subject to change without notice. While every
precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and
typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro
Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this
document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or
fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described
herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this
document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed
agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
All rights reserved. AMD, the AMD Arrow logo, EPYC, RYZEN, Threadripper, Radeon, Infinity Fabric, and combinations
thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification
purposes only and may be trademarks of their respective companies.
2.1: Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE
International Solid-State Circuits Conference
33 of 33

Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core

  • 1.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 1 of 33 Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core T. Singh1, S. Rangarajan1, D. John1, R. Schreiber1, S. Oliver1, R. Seahra2, A. Schaefer1 1AMD, Austin, TX, 2AMD, Markham, ON, Canada Presented at ISSCC 2020
  • 2.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 2 of 33 Outline • Motivation • Market Segments • Architecture • Core Complex • Technology • Implementation • SRAMs • Power • Silicon Results • Conclusion
  • 3.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 3 of 33 Motivation • Zen was a huge lift • Zen2 compelling successor to Zen • Goals – Give above industry trend generational performance improvement – Enable 2x cores same socket – Improve single thread (1T) performance • How can we do this? – Technology port – Architectural changes – Physical design and methodology changes • AMD was aggressive and we did all of the above to achieve the goals!!
  • 4.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 4 of 33 Zen 2 Market Segments
  • 5.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 5 of 33 Zen 2 Architecture • Changes from Zen – New TAGE Branch Predictor – Optimized L1 Instruction Cache: 32K/8-way vs. 64K/4-way – 2X Op Cache Capacity: 4K vs. 2K ops – 2X Floating Point Data Path Width: 256b vs 128b – 3rd Address Generation Unit – Larger Physical Structures: Integer Scheduler, PRF, ROB, Store Queue, L2DTLB – 2X L1 Data Cache Read/Write Bandwidth – 2X L3 Cache: 16MB vs. 8MB per Core Complex (CCX) • +15%1 single thread (1T) IPC over Zen • ~9% switching capacitance (CAC) improvement over previous generation, technology neutral 1 AMD "Zen 2" CPU-based system scored an estimated 15% higher than previous generation AMD “Zen” based system using estimated SPECint®_base2006 results. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org.
  • 6.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 6 of 33 Core Functional Units • 32KB IC • 32KB DC • ~20 blocks, ~400K avg instances • ROM for uCODE • 5 L1 RAM variants • Chip Pervasive Logic (CPL) – clock/test block Floating Point Data Cache Load/ Store ALU Scheduler Branch Prediction I-CacheDecode L2 Cache uCode CPL
  • 7.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 7 of 33 L2/L3 Cache Hierarchy • Only 3 unique custom macros – Down from 8 on Zen • Each 4M slice is identical • Multi-stage clock gating in L3 to keep clock distribution power the same as 8M L3 from Zen • LDOs incorporated into the L3 to supply VDDM to L2 and L3 arrays – Loss of package distribution of VDDM meant LDOs had to be moved closer – Must reduce current on VDDM CTLL3Tags L3Data 4M Slice L2 Data L2 Tags L2 Status Shadow tag macros for serving external probes 512K L2 LDOs
  • 8.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 8 of 33 Zen 2 Core Complex (CCX) • 4 core complex • L3 size increases to 16MB • Design for flexibility • Maximize # cores for server case
  • 9.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 9 of 33 Zen 2 CCX Configs HEDT/Server 4 Core, 16MB L3 CCX APU 4 Core, 4MB L3 CCX Value 2 Core, 4MB L3 CCX • Zen 2 Core can be used in various configs covering a wide power range • Multiple CCX can be placed to achieved desired core count Cores Market TDP 8 Notebook 15W 6 Desktop 65 W 8 Desktop/Server 65-120 W 12 Desktop/Server 105-120 W 16 Desktop/Server 105-155 W 24 HEDT/Server 155-280 W 32 HEDT/Server 155-280 W 48 Server 200-225 W 64 HEDT/Server 200-280 W
  • 10.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 10 of 33 Zen vs. Zen 2 Technology Comparison Zen Zen 2 Tech 14nm FinFET 7nm FinFET Cores/CCX 4 Cores, 8 Threads 4 Cores, 8 Threads Area/CCX 44 mm2 31.3 mm2 L2/core 512KB 512KB L3/CCX 8MB 16MB CPP 78 nm 57 nm Fin Pitch 48 nm 30 nm 1x Metal Pitch 64 nm 57 nm Stdcell Track Library 10.5 track 6 track Cu Metal Layers 11 w/ MiM 13 w/ MiM
  • 11.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 11 of 33 Zen vs. Zen 2 Technology Comparison (cont) Zen (14nm) Zen 2 (7nm) Layer Name Pitch Layer Name Pitch M0 StdCell Internal n/a M0 StdCell Internal 1.0x M1 StdCell Internal 1.0x M1 Stdcell & BEOL 1.425x M2-M3 1.0x M2-M3 1.0x-1.1x M4-M7 1.25x M4-M7 2.0x M8-M9 2.0x M8-M9 2.0x --- --- M10-M11 3.15x M10-M11 (RDL) 11.25x M12-M13 (RDL) 18.0x
  • 12.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 12 of 33 Place and Route Design Optimization • 7nm FinFET presents unique route challenges – Lower layer jogs forbidden – Denser standard cells with reduction in track height – Increased lower level metal resistance • Deep collaboration between AMD CAD, foundry, and EDA partners – Cell density management – Advanced legalization techniques – Improved pre-route timing estimates – Wire Engineering and Via Ladders Same-Layer Jogs Forbidden Inter-Layer Jumpers Required
  • 13.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 13 of 33 Placement Restricted by Large Cells • Multi-row cells benefit power and area, but create placement challenges • Clustering of flops has many benefits but can cause placement issues • Resulting small gaps are challenging to use and required innovation to exploit – New algorithms – Flexible power grid choices
  • 14.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 14 of 33 Design RC Miscorrelation • Pre-route vs Post-route miscorrelation caused by length and layer assumptions • Pre-route miscorrelations for resistance and capacitance have differing root causes – Layer assignment for resistance – Length estimates for resistance and capacitance • Based on previously modeled trends, EDA tools may have challenges estimating delay • Required innovation to tackle Layer Normalized Resistance Normalized Capacitance M1 1.00 1.00 M2 3.17 0.96 M3 2.31 0.96 M4 0.72 0.75 M5 0.55 0.83 M6 0.52 0.83 M7 0.55 0.83 M8 0.52 0.83 M9 0.55 0.92 M10 0.16 0.96 M11 0.16 0.92
  • 15.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 15 of 33 Pre-Route Correlation Improvements • Plots showing ClockTreeSynthesis vs Route timing • Large variance in initial results – Large number of paths have overly- pessimistic delay during pre-route steps. Tools waste resources trying to fix – Significant number of paths have optimistic delay estimates. These paths are under- optimized • Employed timing with targeted capacitance scaling and global route- based layer estimation – Standard deviation dramatically improved while keeping a slightly pessimistic mean cts_vs_route.slack.corr cts_vs_route.slack_delta.hist cts_vs_route.slack.corr cts_vs_route.slack_delta.hist Timing Slack Correlation Timing Slack Delta Initial Results Improved Results Pessimistic Optimistic
  • 16.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 16 of 33 Wire Engineering Challenges • Lower layers getting more resistive with latest technology nodes – Very short routes in tight data paths need a buffer – Routes longer than Steiner due to complex rules – Challenging for optimization tools to comprehend • Critical signals need to get to higher layers quickly
  • 17.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 17 of 33 Wire Engineering and Via Ladders • Team used selective layer optimization, buffering, pre-routes, and via ladders to exploit the fast layers for critical signals • Two types of via ladders – High Performance: for large buffers driving long wires – EM: for high-activity gates (e.g., clock drivers) – Mitigated EM issues on large fanout nodes with high activity Top Via Ladder View Side Via Ladder View
  • 18.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 18 of 33 L2/L3 Cache Changes • Zen had an on-die LDO to generate VDDM supply for use by cache arrays • Zen 2’s package choices make using package layers for VDDM distribution impossible • Moved the bitline precharge from VDDM to VDD to reduce current VDDM VDDM BLT[] BLC[] WRCS[] RDCSX[] SAPCX SAEN WDT_X WDC_X XCENX SAT SAC SAC_INT SAT_INT BLPCX WL[N:0] BLT[] BLC[] WRCS[] RDCSX[] SAPCX SAEN XCENX SAT SAC SAT SAC BLPCX WL[N:0] NegBL Write DriverWDT_X WDC_X
  • 19.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 19 of 33 • Moving bitline precharge to VDD creates both bitcell stability and writeability challenges • High level of configurability allows for silicon flexibility VDD Precharge Challenges SRAMSRAMSRAMSRAMs WLUdEn NegBlEn Assist configurations Fuses Assist controller System Management Programming details superVminEn superVmaxEn Voltage thresholds superVmaxEn=1superVminEn=1 VDDM VDDmax VDDmin VDD Controller pauses voltage increase and unsets superVminEn register before continuing to raise voltage Controller pauses voltage increase and sets superVmaxEn register before continuing to raise voltage VDD where VDDM-VDD=superVminThreshold VDD where VDD-VDDM=superVmaxThreshold
  • 20.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 20 of 33 • Moving precharge to VDD reduced our current enough to allow on die-distribution but presents other challenges • Read before write timing challenges at low VDD, high VDDM VDD Precharge Timing Challenges WL@ constant VDDM BLPCX @ high VDD BLPCX @ low VDD WL on before Bitline precharge turns off at low VDD! Bitline precharge turns on before WL turns off at high VDD! Power races with WL Read before write challenges
  • 21.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 21 of 33 • Solving these multiple voltage timing challenges required a number of techniques – Dual voltage clock shapers to average two voltage domains • Can alter the number of these buffers on VDD or VDDM or remove them entirely to make timing more or less dependent on either supply – False read before write problem can be mitigated by compressing the front end of the WL during a write operation Solving Timing Challenges ISOX@VDDM Input@VDD shapedFallInput @VDD VDDM LS LS VDD Psuedo-dynamic level shifter WREN WLCLK WLCLK_shape WLCLK WL during read WL during write
  • 22.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 22 of 33 CAC Comparison • 3% decrease in flop power allocates more budget for combinational logic
  • 23.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 23 of 33 FLOP Palette Improvements • Rich flop library, balance timing/power needs by driving right flop mix • Up to 8% Fmax benefit from high speed flops in timing critical loop paths Best for Performance Best for Power
  • 24.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 24 of 33 Low Power Gater Latch Energy with AvgApp Activity (fJ) State LP Latch Regular Latch Ratio E=1 0.22 0.18 121% E=0 0.17 1.61 10% Total 0.38 1.79 22% E TE CLKB CLKBB CLKBBCLKB CLKBB CLKB qf_x qf Q Dbar Dbar qf CLK • 90% Power savings in latch for common case of E = 0 through internal self gating • Clock gater latch power contribution from 22% in Zen to 13% in Zen 2 for an average application
  • 25.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 25 of 33 Zen 2 Clock Optimization • Multi mesh plan for the core supported by configurable clock tree construction – FP level mesh gating enabled with minimal timing/area overhead – 15% Mesh power savings in Idle and Average App • Tight clock skew distribution • Relocated clock spines and technology shrink (vs. Zen) achieves similar skew profile while reducing CAC
  • 26.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 26 of 33 Zen vs. Zen 2 CAC Comparison • Primary sources of CAC reduction – 14 nm to 7 nm scaling – 6 track library – Aggressive microarchitectural CAC optimizations
  • 27.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 27 of 33 Generational Leadership Perf/Watt • Performance/Watt driven by a combination of technology and design improvements • Timing – Improved scalability by optimizing at a wider voltage range compared to Zen – Multi-corner optimization • Library choice and optimization – 6 track library enabled additional CAC/leakage savings in addition to default technology entitlement • Design CAC – MBFF, low power clock-gater library optimization – RTL improvements – CAC aware downsizing methodology Zen power @ 100% IPC Zen2 power @ 115% IPC 7nm CAC Savings 7nm Timing Design CAC Savings Library Choice Power Improvements – ISO Frequency
  • 28.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 28 of 33 Frequency/Power Silicon Results • 4 cores active with 2 threads per core • The combined effect of lower Vmin for the same frequency and reduced CAC enabled a 50% reduction in power for a given frequency throughout most of the F(P) curve • This enables 2x cores in the same socket!! 50% power reduction
  • 29.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 29 of 33 Frequency/Voltage Silicon Results • 1 core active with two threads per core, 3 cores idle • F/V curve improved over all voltages • Design worked to improve the low voltage performance for improved linearity • Wide voltage range
  • 30.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 30 of 33 Conclusion • Met Goals – Moved to energy efficient TSMC 7nm finFET – Made huge architectural changes – Improved PD and methodology • Results are clear – Scalable across 15W mobile to 280W Server – 50% reduced power at iso-frequency – Enable 2x cores in same-socket – >15% 1T IPC over previous generation – ~9% CAC improvement over previous generation technology neutral – Enables peak frequencies up to 4.7GHz (+350MHz generationally) • Zen2 delivers generational performance uplift!!
  • 31.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 31 of 33 Acknowledgements • We would like to thank our talented AMD design team across Austin, Fort Collins, Santa Clara, Boston, Markham, and India who contributed to Zen 2 • Please stay for our chiplet paper next
  • 32.
    2.2 : AMDChiplet Architecture for High-Performance Server and Desktop Products © 2020 IEEE International Solid-State Circuits Conference 32 of 33 Disclaimer and Endnotes DISCLAIMER The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 All rights reserved. AMD, the AMD Arrow logo, EPYC, RYZEN, Threadripper, Radeon, Infinity Fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
  • 33.
    2.1: Zen 2:The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core© 2020 IEEE International Solid-State Circuits Conference 33 of 33