Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU

DELIVERING A NEW LEVEL
OF VISUAL PERFORMANCE
IN AN SOC
AMD “RAVEN RIDGE” APU
AMD CONFIDENTIAL
Dan Bouvier, Jim Gibney, Alex Branover, Sonu Arora
Presented by:
Dan Bouvier
Corporate VP, Client Products Chief Architect

| AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |2
CPU Performance GPU Performance Power
FIRST
“Zen”-based APU
LONG BATTERY LIFE
Premium form factors
HIGH-PERFORMANCE
On-die “Vega”-based graphics
AMD Ryzen™ 7 2700U 7th Gen AMD A-Series APU
200%
MORE CPU PERFORMANCE
Up to
Scaled GPU
and CPU up to
reach target
frame rate
Managed
power delivery
and thermal
dissipation
Improved
memory
bandwidth
efficiency
Upgraded
display
experience
Increased
package
performance
density
RAISING THE BAR FOR THE APU VISUAL EXPERIENCE
128%
MORE GPU PERFORMANCE
Up to
58%
LESS POWER
Up to
* See footnotes for details.
MOBILE APU GENERATIONAL
PERFORMANCE GAINS

“RAVEN RIDGE” APU
“ZEN” CPU
(4 CORE | 8 THREAD)
CPU 2
CPU 0
USB 3.1
-----------
USB 2.0
Display
Controller
Next
AMD GFX+
(11 COMPUTE UNITS)
Infinity Fabric
Platform
Security
Processor
4MB
L3 Cache
Multimedia
Engines
PCIe GPP
Video
Codec
Next
Audio
ACP
NVMe
-----------
SATA
X64DDR4
System
Management
Unit
CPU 1
CPU 3
PCIe
Discrete
GFX
CU CU CU CU
CU CU CU CU
CU
CU CU
X64DDR4
1MB L2
Cache
Sensor
Fusion
Hub
AMD “VEGA” GPU
AMD “ZEN” x86 CPU CORES
HIGH
BANDWIDTH
SOC FABRIC
& MEMORY
SYSTEM
FULL
SYSTEM
CONNECTIVITY
UPGRADED
DISPLAY ENGINE
INTEGRATED
SENSOR
FUSION HUB
ACCELERATED
MULTIMEDIA
EXPERIENCE

Technology: GLOBALFOUNDRIES 14nm – 11 layer metal
Transistor count: 4.94B
Die Size: 209.78mm2
“Raven Ridge” die
than prior generation “Bristol Ridge” APU
SIGNIFICANT DENSITY INCREASE
more transistors
59%
smaller die
16%
BGA Package: 25 x 35 x 1.38mm

INTEGRATED “VEGA” GRAPHICS
Graphics Engine
▪ Up to 11 Next Gen Compute Unit (NCU)
▪ 1 MB L2
▪ Flexible Geometry Engine
▪ 1 Draw Stream Binning Rasterizer
▪ 16 Pixels Units (32bpp)
▪ 44 Texture Units
DirectX® 12.1 Features
▪ Conservative Rasterization
▪ Raster Ordered Views
▪ Standard Swizzle
▪ Axis Aligned Rectangular Primitives
Throughput at 11 NCU
▪ 1200 MTri/sec @ 1200 Mhz
▪ Rendering 19.2 GPix/sec @1200 MHz
▪ 1690 FP32GFLOPS /
3379 FP16GFLOPS @ 1200 MHz
▪ 52.8 MTex per second @ 1200MHz
Infinity Fabric
Geometry/Raster/RB+
sDMA
CP
NCU Array
Shader System (SS) Workload Manager
Core Fabric
L2L2L2L2
ACE
ACE
ACE
ACE

“ZEN” CPU IMPROVES VISUAL FRAME RATE
Decode
4 instructions/cycle
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMULALU
2 loads + 1
store per cycle
6 ops dispatched
Op Cache
INTEGER FLOATING POINT
ALU ALU ALU
Micro-op Queue
64K I-Cache 4 way Branch Prediction
AGUAGU
Load/Store
Queues
Integer Physical Register File
32K D-Cache
8 Way
FP Register File
Integer Rename Floating Point Rename
Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler
Micro-ops
CORE 3
CORE 1L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 3L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 0 L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 2 L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
High performance “Zen” core
▪ Free up more power for GPU
Shared L3 Cache
4MBL2 Cache per core
512KB“ZEN” CPU cores
4Up to

“ZEN” WITH PRECISION BOOST 2
▪ Governed by CPU temperature,
current, load
▪ Seeks highest possible frequency from
environmental inputs, graceful roll-off
▪ Opens new boost opportunities for
real-world nT workloads (e.g., games)
▪ 25MHz granularity

TUNE FOR THE PHASES OF VISUAL WORKLOADS
▪ Trade power/current based on
dynamic utilization:
− Core ↔ Core
− CPU ↔ GPU
▪ On-die regulation and fine-grained
frequency control enables fast,
accurate frequency and voltage
changes
▪ Fine-grained p-states (FGPS) across
the IPs - continuous frequency control
STEER POWER WHERE IT’S BEST USED

“ZEN” CPU AND “VEGA” GFX CO-MANAGEMENT
▪ CPU threads feed major GPU resources:
3D engine, compute engine, and DMA
engine (data fetch and writeback)
▪ CPU “submits” tasks, GFX “renders” or
“computes”
▪ One coherent control and data interface to
integrate and manage the full SoC
▪ Power budgeting based on activity and
efficiency
▪ Enhanced flow for quiescing/powering-off
CPU-GFX component
WITH INFINITY FABRIC
Infinity FabricMultimedia
Engines
“ZEN” CORE COMPLEX
“Zen”
Core
“Zen”
Core
“Zen”
Core
“Zen”
Core
L3
Cache
“VEGA” GRAPHICS
Graphics
Pipeline
L2 Cache
Pixel
Engines
Compute
Engine
I/O and
System Hub
Display
Engine
DDR4
Memory
Controllers

FAST DEPLOYMENT OF NEW ARCHITECTURE
▪ Standard port definition for IP connections
(SDP = Scalable Data Port)
− Common interface definition used for CPU, GPU,
I/O, multi-media hubs, display, memory controller
▪ Coherent HyperTransport™ transport layer
− Builds upon generations of coherent fabric
development
− Flexible topology to adapt to diverse SoC
configurations
▪ SDP hides complexities of coherence
protocol from connected IP
MODULAR AMD INFINITY FABRIC
Transport Layer
Engines
Memory
Controllers
I/O Sub
system
Accelerators
SDP
Interface
Modules
Engines

“RAVEN RIDGE” INFINITY FABRIC
“Raven Ridge” Optimizations
▪ 32 Byte internal datapath width
▪ Up to 1.6GHz for bandwidth
exceeding 50GB/s
▪ Up to 5 transfers/clock per switch
▪ Improved CPU latency under load,
while maintaining DRAM efficiency
▪ Structured for multi-region
power gating
▪ Floorplan-aware, optimized display
to memory routing
CPU Core
Complex
Memory
Controllers
I/O Sub
system
Display
Controller
Region A
Coherent
Master
Coherent
Slave
Memory
Controllers
Coherent
Slave
Transport Layer
Switch
Transport Layer
Switch
Non Coherent
Master
IO
Master/
Slave
Graphics
Container
Multimedia
Hub
Coherent
Master
Transport Layer
Switch
Transport Layer
Switch
Non Coherent
Master
Graphics
Container
Coherent
Master

Picker arbitration generally age ordered,
except when younger passes older due to:
1) priority
2) VC resource availability
3) other resource such as output port busy
QUALITY OF SERVICE FOR SMOOTH VISUAL EXPERIENCE
Three Request Classes
▪ Hard real time:
− High BW (e.g., display surface refresh)
− Low BW (e.g., audio)
▪ Soft real time (e.g., video playback)
▪ Non real time
(e.g., typical CPU/GPU/IO requests)
Architectural Mechanisms
▪ Multiple virtual channels
▪ Priority classes (Low/Medium/High/Urgent)
▪ End-to-end priority escalation by VC for out
of bounds conditions
Transport
Request
Queue
Transport
Response
Queue
Transport
Probe
Queue
Transport
Data
Queue
PICKERS PICKERS PICKERS PICKERS
Switch-level View of QoS Architecture
BUFFERS
VC
Dedicated
Tokens
Shared Pool
Tokens

MEMORY BOUND PERFORMANCE OPTIMIZATION
New features and optimized SoC
configuration contribute to
improved memory-limited
performance:
▪ Caching and algorithms to reduce
memory requests
▪ Improved lossless compression
usage (DCC)
▪ Better request ordering to reduce
DRAM page conflicts and
read/write turnarounds
Fabric Transport Layer
Memory
Controllers
“Vega” GFX
Engine
“Zen” CPU Core
Complex
Memory
Controllers
Display
Controller
Multimedia
Hub
4MB
L3 Cache
▪ 1MB Shared L2 Cache
▪ Larger dedicated GPU
TLB cache
▪ Deferred Primitive
Batch Binning
▪ Multi Level DRAM
Aware Reordering
Deeper
Arbitration
Queues
Direct Reads of
Compressed memory
Memory Efficient
Quality of Service

“RAVEN RIDGE” GRAPHICS SCALING
GENERATIONAL IMPROVEMENTS FOR MEMORY BOUND GAMING PERFORMANCE
Gaming performance scaling uplift
due to new AMD Vega GPU features:
▪ 4x larger GFX L2 cache, unified
across all graphics clients
▪ DSBR (Draw Stream Binning
Rasterizer) feature reduces
bandwidth
▪ Improved lossless DCC memory
compression

NEW GENERATION DISPLAY AND VIDEO CODEC ENGINE
Display Engine (DCN)
▪ Flexible display pipe architecture
− Up to four 4kp60 displays
▪ Low power display engine with DCC, 4K2K@60hz @Vmin
▪ HDR support
− From 32bpp to 64bpp surfaces
− From sRGB to BT2020
▪ Higher bandwidth interfaces - HDMI 2.1, DP 1.4, HBR3
▪ USB-Type C with display alt-mode
Video Codec (VCN)
▪ Unified encode and decode engine
− Up to 4kp60 HEVC 10b decode
− Up to 4kp30 HEVC 8b encode
▪ Low power video playback – 4kp30 @Vmin
▪ HEVC 10b decode
▪ HEVC encode for superior quality skype
▪ VP9 decode for efficient YouTube playback
Memory
Interface
Hub
F(+)
Input
Processing
Input
Processing
Input
Processing
Input
Processing
Output
Pipe
Output
Pipe
Output
Pipe
Output
Pipe
DISPLAY ENGINE
InfinityFabric
AltMode Ctrl
Type CDisplay
USB/DP Mux
USB

Represents LDO
Regulated / Power
Gating Region
L3
CPU Region
“ZEN” CORE COMPLEX
“VEGA” GRAPHICS COMPLEX
GFX Compute Region
GFX Region
VDD Region
VDD Package Rail
EFFICIENT POWER DELIVERY
▪ Current delivery overprovisioned for worst-case
overlap between CPU and GPU
▪ Fine-grain LDO control allows for efficient
tracking of the CPU and GFX phases, powered
by a unified VDD power rail
▪ 1st stage: off-chip motherboard vreg
2nd stage: on-chip vreg with digital LDO
▪ Multiple digital LDO regions for CPU cores,
graphics core, and sub-regions
− Idle engine is powered off
▪ Allows more peak CPU/GPU current to improve
boost performance
WITH DIGITAL LOW-DROPOUT REGULATORS
CPU 1CPU 0
CPU 2 CPU 3
System
Voltage
Regulator

SYNERGISTIC POWER RAIL SHARING
▪ Shared regulator reduces total
regulator current requirements
▪ Less motherboard power supply
footprint
▪ More peak CPU/GPU current to
improve boost performance
WITH DIGITAL LDO REGULATORS

DeeperLowPowerStates
FasterEntry/ExitLatencies
ENHANCED POWER OFF STATE
For CPU Cores
▪ Each core can enter CC6 power gating
▪ CPUOFF can lower L3 cache power when
all cores in CC6
For Graphics
▪ Gating can power down up to 95% of
the GPU
▪ GFXOFF can further power down GPU
un-core (aka GPU monitor logic)
GFXOFF+CPUOFF=VDDOFF;
Halts System VDD Regulator
▪ Up to 99% residency in Windows static
screen idle*
CPU AND GPU
Region Power Gating
by LDO PG Headers
Latencies 100us or less
Multiple LDO
Regions Gated
Latencies 1.5ms or less
Input VDD
Rail Off
CC6
Active States, Deep
Sleep States, Clock
Gated States
Active States,
Clock Gated
States
Meet
CC6
Entry
Timer
Meet
GFX Idle
Entry
Timer
CPUOFF GFXOFF
All Cores
in CC6 and
Meet CPUOFF
Entry Timer
Meet
GFXOFF
Entry
Timer
Enter if
Simultaneous
CPUOFF and
GFXOFF
VDDOFF
Graphics
Power
Gating

MORE THERMAL COMPUTE HEADROOM IN NOTEBOOKS
SKIN TEMPERATURE AWARE POWER MANAGEMENT (STAPM)
Before STAPM:
APU guard-banded to Tj~60C to meet
Tskin requirements
After STAPM:
Delta between ambient and Tskin
calculated based on the power/activity
system components
Conceptual example of behavior

| AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |20 * See footnotes for details.
AMD Ryzen™ 7
2700U
Core i7-8550U Core i7-7500U AMD FX™ 9800PAMD Ryzen™ 5
2400G
Core i5-8400 Core i5-7400
3DMARK® TIME SPY

League of
Legends™
1080p
DirectX®9
Medium
DOTA™ 2
1080p
DirectX®11
Fastest+
CS:GO™
1080p
DirectX® 9
Medium
No MSAA
Quake®
Champions
1280x720
DirectX® 11
High
AverageFPS
GOOD VISUAL THRESHOLD
30
GAMING ON THE GO
IN AN ULTRATHIN
AMD RYZEN™ 5 2400G
DESKTOP PROCESSOR
TRUE HIGH-DEFINITION
1080P GAME PERFORMANCE
Battlefield 1
1080p
Low, DX12
Overwatch™
1080p
Medium
Rocket
League
1080p
Medium
Skyrim
1080p
Medium
Witcher 3
1080p
Low, Hair
Works Off
AverageFPS
GOOD VISUAL THRESHOLD
30
Overwatch™
1280x720
DirectX®11
Low79%
Render Scale

Developing energy efficient processors
has long been a design focus at AMD.
In 2014, AMD set a bold “25x20” goal to
deliver at least 25X more energy efficiency
in our mobile processors by 2020. Visit
AMD.com/25x20.
25XADDITIONAL ENERGY
EFFICIENCY BY 2020
(2014–2020)
25X
AMD ACCELERATING ENERGY EFFICIENCY
ON TRACK TO ACHIEVE OUR GOAL
Energy efficiency of AMD APUs* “25x20” goal
2 0 1 7

The true potential of
the APU realized by
combining “Zen” CPU
with “Vega” Graphics
Advances in power and
thermal management
provide more headroom
for visual throughput
Data movement
improvements at all
levels to reduce
bandwidth bottlenecks

FOOTNOTES
Slide 2: Based on AMD testing as of 9/28/2017. System configuration(s): AMD Reference Motherboard (2700U), HP ENVY X360 (FX-9800P/”7th Gen APU”), Samsung 850 Pro SSD, Windows 10 x64 1703,
1920x1080. AMD Ryzen™ 7 2700U Graphics Driver: 23.20.768.9. AMD FX-9800P Graphics Driver: 22.19.662.4. 1x8GB DDR4-2133 (AMD FX-9800P). 2x4GB DDR4-2400 (AMD Ryzen™ 7 2700U). Power
Consumption defined as joules of power consumed during a complete run of Cinebench R15 nT: AMD FX™ 9800P = 3782 joules (100%) vs. AMD Ryzen™ 7 2700U =1594J (58% less). Different configurations
may yield different results
Slide 4: Based on “Bristol Ridge” die size of 250.04mm2 and transistor count of 3.1 billion.
Slide 7: Based on AMD testing of as of 9/25/2017. System configuration(s): AMD Reference Platform, AMD Ryzen™ 7 2700U APU, 2x4GB DDR4-2400, graphics driver 17.30.2015. AMD SenseMI technology is
built into all Ryzen processors, but specific features and their enablement may vary by product and platform. Learn more at http://www.amd.com/en/technologies/sense-mi.
Slide 8: Based on AMD testing as of 10/11/2017. Clock speed plot is a snapshot of 8 seconds of 3DMark Fire Strike. “Effective frequency” is the product of the reported clock speed and %time in active
workload C0 C-state.
Slide 14: Based on AMD testing as of 6/11/2018. System configuration(s): AMD “Bristol Ridge” Mobile APU reference platform, AMD FX-9800P, 2x8GB DDR4-2400, Crucial BX100 SSD, Windows 10 x64 Build
16299, Graphics Driver: 21.19.384.20, BIOS: TMY130BA; AMD Ryzen™ Mobile APU reference platform, AMD Ryzen™ 7 2700U, 2x8GB DDR4-2400, WD7500BPKX, Windows 10 x64 Build 16299, Graphics
Driver: 24.20.154.6220, BIOS: WGV8215N
Slide 17: Based on AMD infrastructure requirements for “Bristol Ridge“ 15W TDP (VDDCR_CPU supply EDC limit is 35A, VDDCR_GFX supply EDC limit is 35A), and AMD infrastructure requirements for “Raven
Ridge” 15W TDP (VDDCR_VDD supply EDC limit is 45A).
Slide 18: Based on AMD internal data of an optimized AMD Ryzen™ Mobile APU reference platform as of 9/25/2017. PC manufacturers may vary configuration yielding different results.
Slide 20: Notebook: Based on AMD testing as of 9/25/2017. Common system configurations: Samsung 850 Pro SSD, Windows 10 x64 1703, 1920x1080; Intel Graphics Driver: 22.20.16.4691; AMD Ryzen™
mobile APU Graphics Driver: 23.20.768.9; AMD FX-9800P Graphics Driver: 22.19.662.4; AMD FX-9800P configured in HP ENVY X360 (1x8GB DDR4-2133). AMD Ryzen™ 7 2700U configured in AMD reference
platform (2x4GB DDR4-2400). Core i7-8550U configured in Acer Swift 3 (2x4GB DDR4-2400). Core i7-7500U configured in HP ENVY X360 (2x4GB DDR4-2400). Graphics results measured with 3DMark®
TimeSpy. Core i7-8550U score (350) is baseline 100%. Core i7-7500U score (377) is 107% of baseline. AMD FX-9800P score (400) is 114% of baseline. AMD Ryzen™ 7 2700U score (915) is 261% of baseline.
Different configurations may yield different results.
Desktop: Common system configurations: Samsung 850 Pro SSD, Windows 10 x64 Pro RS3, 1920x1080; Intel i5 8400 Graphics Driver: 15.47.02.4815; Intel I5-7400 Graphics Driver: 15.46.05.4771; AMD
Ryzen™ mobile APU Graphics Driver: CL1491290-171206a-321461E 2.1.1 RC5 17.40 RC19; AMD Ryzen™ 5 2400G configured in AMD reference platform (2x8GB DDR4-2667). Core i5-8400 configured in Z370
Aorus Gaming 5 (2x8GB DDR4-2667). Core i5-7400 configured in B250 Gaming M3 (2x8GB DDR4-2400).
Slide 21: Based on AMD testing as of 9/25/2017. System configuration(s): HP ENVY X360, AMD Ryzen™ 7 2700U, 2x4GB DDR4-2400, Samsung 850 Pro SSD, Windows 10 x64 1703, Graphics Driver:
17.30.1025, BIOS F11.
Desktop Testing by AMD Performance labs as of 01/02/2018 on the following systems. PC manufacturers may vary configurations yielding different results. Results may vary based on driver versions used.
System Configs: All systems equipped with 16GB dual-channel DDR4 @ 2666 MHz, Samsung 850 PRO 512GB SSD, Windows 10 RS2 operating system. Socket AM4 System: AMD Ryzen 5 2400G, AMD Ryzen
3 2200G, Myrtle RV motherboard. Graphics driver 23.20.768.0 (17.40).
Slide 22: Data source: AMD confidential based on internal test results of upcoming “Raven Ridge” APU.

ATTRIBUTION
DISCLAIMER
THE INFORMATION CONTAINED HEREIN IS FOR INFORMATIONAL PURPOSES ONLY, AND IS SUBJECT TO CHANGE WITHOUT NOTICE. WHILE EVERY PRECAUTION HAS BEEN TAKEN IN THE
PREPARATION OF THIS DOCUMENT, IT MAY CONTAIN TECHNICAL INACCURACIES, OMISSIONS AND TYPOGRAPHICAL ERRORS, AND AMD IS UNDER NO OBLIGATION TO UPDATE OR OTHERWISE
CORRECT THIS INFORMATION. ADVANCED MICRO DEVICES, INC. MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS
DOCUMENT, AND ASSUMES NO LIABILITY OF ANY KIND, INCLUDING THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR PARTICULAR PURPOSES, WITH RESPECT
TO THE OPERATION OR USE OF AMD HARDWARE, SOFTWARE OR OTHER PRODUCTS DESCRIBED HEREIN. NO LICENSE, INCLUDING IMPLIED OR ARISING BY ESTOPPEL, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. TERMS AND LIMITATIONS APPLICABLE TO THE PURCHASE OR USE OF AMD’S PRODUCTS ARE AS SET FORTH IN A SIGNED AGREEMENT BETWEEN
THE PARTIES OR IN AMD'S STANDARD TERMS AND CONDITIONS OF SALE. GD-18
©2018 ADVANCED MICRO DEVICES, INC. ALL RIGHTS RESERVED. AMD, THE AMD ARROW LOGO, RYZEN, RADEON AND COMBINATIONS THEREOF ARE TRADEMARKS OF ADVANCED MICRO DEVICES,
INC. 3DMARK AND PCMARK ARE REGISTERED TRADEMARKS OF FUTUREMARK CORPORATION IN THE UNITED STATES AND OTHER JURISDICTIONS. SPEC AND SPECVIEWPERF ARE REGISTERED
TRADEMARKS OF THE STANDARD PERFORMANCE EVALUATION CORPORATION IN THE UNITED STATES AND OTHER JURISDICTIONS. MICROSOFT, THE WINDOWS LOGO, AND DIRECTX ARE
TRADEMARKS AND/OR REGISTERED TRADEMARKS OF MICROSOFT CORPORATION IN THE UNITED STATES IN OTHER JURISDICTIONS. OTHER PRODUCT NAMES USED IN THIS PUBLICATION ARE FOR
IDENTIFICATION PURPOSES ONLY AND MAY BE TRADEMARKS OF THEIR RESPECTIVE COMPANIES.

Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU

Similar to Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU (20)

More from AMD

More from AMD (14)

Recently uploaded

Recently uploaded (20)

Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU