First Look at AMD Vega GPU Architecture

CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Vega
AMD’s Next-Generation GPU Architecture

CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
We want rich, lavish
virtual worlds.

We want to create with limitless detail,
in real time.

We want to make decisions
based on exabytes of data
in an instant.

GPUs are taking on more
diverse workloads
WORKSTATION
Physically Based Rendering
Physics Modeling
Loom (VR)
Hi-Res HDR Content Creation
GAMING
4K VR
Consoles
New Rendering Pipelines
New APIs
eSports
COMPUTE
Machine Learning
Image Recognition / Computer Vision
Natural Data Processing
GPU

Conventional architectures are
not scaling to meet needs

Game install sizes are expanding exponentially
Gigabytes
RelativeDataSize
Deus Ex Series Install Disk Size (source: Steam)
Chart for illustrative purposes
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
The Lord of
the Rings
Fellowship of the
Ring
The Hobbit
An Unexpected
Journey
Pro graphics data sets are well into the petabytes
Petabytes
RelativeAssetSize
Avatar
The Hobbit
The Desolation of
Smaug
The Hobbit
Battle of the Five
Armies
The BFG
See endnote for details

Compute workloads have shot into the exabytes
Character Recognition Object Detection
Image Recognition
Image/Video Recognition
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
See endnote for details
Data Point Too Big to
Illustrate
Exabytes
RelativeTrainingSetSize

Growth in processing power is outpacing
growth in memory capacity
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Relative GPU Compute (GFLOPS)
Relative GPU Storage Capacity
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST See endnote for details

Introducing the world’s
most scalable GPU
memory architecture

High-Bandwidth Cache
High-Bandwidth
Cache

bandwidth per pin*
2XHBM2
*vs HBM

HBM2
Over 50% smaller footprint
HBM2 vs. GDDR5
8X Capacity/stack*
*vs HBM

High-Bandwidth
Cache Controller
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM

virtual address space
512 TBHigh-Bandwidth
Cache Controller

Adaptive, fine-grained
data movement
High-Bandwidth
Cache Controller

Total Allocations Accessed
(Ultra 4K)
Time
(Ultra 4K)
See endnotes for details
Time
GraphicsMemory
GraphicsMemory

Image from Deus Ex: Mankind Divided™ courtesy of Eidos Montreal

CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST Image from Deus Ex: Mankind Divided™ courtesy of Eidos Montreal

New Programmable
Geometry Pipeline
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline

peak throughput
per clock
2XOver
New Programmable
Geometry Pipeline
See endnotes for details

Primitive Shaders

Improved Load Balancing

Next-Generation
Compute Engine
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine

I N T R O D U C I N G
VegaNCU
Next-GenerationComputeUnit
512
8-bit ops
per clock
256
16-bit ops
per clock
128
32-bit ops
per clock
Double Precision Rate is Configurable
*See endnotes for details

Rapid Packed Math
Supercharges performance of emerging workloads

NCU is optimized for higher clock
speeds and higher IPC
CU*
NCU
*See endnotes for details

We’ve been working on reducing memory
bandwidth consumption for many years
Texture & Color CompressionFastZ ClearHiZ

Next Generation
Pixel Engine
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine

Fetch once enabled by smart primitive
rasterization with on-chip bin cache
Shade once enabled by culling of pixels
invisible to final scene
Draw Stream Binning Rasterizer
Designed to improve performance and saves power

L1
Compute
Engine
Pixel
Engine
Geometry
Engine
L1
L1
L2
Memory
Controller
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
Legacy Architecture –
Non-coherent Pixel
and Texture Memory
Access

Render back-ends
are now clients of
the L2 cache.
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine L1
L1
L1
L2

Helps improve
performance with
applications that use
deferred shading.
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine L1
L1
L1
L2

L1
Compute Engine
Pixel Engine
Geometry Pipeline
L1
L1
L2
High Bandwidth
Cache Controller
High-Bandwidth
Cache
NVRAM
Network Storage
System DRAM
CPU MM Display XDMA PCIe®

New Programmable Geometry
Pipeline
Revolutionary
High Bandwidth Cache
Advanced
Pixel Engine
Vega
GPU Architecturefor the Immersive and Instinctive Computing Era
Next-Gen Compute Unit

ENDNOTES
Pro graphics data set slide: Data provided by a third party studio and not verified by AMD. Data is historic - cinema asset data sizes for Lord of the Rings (2001) @.15PB, Avatar (2009) @ 1 PB, The
Hobbit –Part1 (2012) @ 1.4 PB, The Hobbit –Part2 (2013) @ 1.8 PB, The Hobbit –Part3 (2014) @ 2.3 PB, and The BFG (2016) @ 3 PB. VG-6
Compute workload data set slide: Typical word character recognition data set defined as 18.3 MB (http://wordnet.princeton.edu/wordnet/download/old-versions/ ). Object Identification datasets
defined as 490MB (http://host.robots.ox.ac.uk/pascal/VOC/databases.html#VOC2005_1). Image recognition defined as 144 GB (http://www.image-net.org/challenges/LSVRC/2010/download-all-
nonpub). Image and video recognition datasets defined as 144 GB (http://image-net.org/challenges/LSVRC/2015/) Natural Data Analysis datasets defined as 2.5QB(http://www.vcloudnews.com/every-
day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily/). VG-7
Growth in processing power slide: Data based on historic product specs; GPU relative frame buffer size vs relative TFLOP capability. The ATI Radeon 9700 Pro was 0.026 TFLOPs with 128 MB
framebuffer. The ATI Radeon X950 XT was 0.08 TFLOPs with 256 MB framebuffer. The ATI Radeon X1900 XT was 0.375 TFLOPs with 512 MB framebuffer. The ATI Radeon HD 2900 XT was 0.4755
TFLOPs with 512 MB framebuffer. The ATI Radeon HD 4870 XT was 1.2 TFLOPs with 512 MB framebuffer. The ATI Radeon HD 5870 was 1.2 TFLOPs with 512 MB framebuffer. The AMD Radeon HD
7970 was 3.79 TFLOPs with 3 GB framebuffer. The AMD Radeon R9 290X was 5.63 TFLOPs with 4 GB framebuffer. The AMD Radeon R9 Fury X was 8.6 TFLOPs with 4 GB framebuffer. VG-5
Witcher 3 and Fallout 4 data slide: Data based on AMD Internal testing of an early Vega sample using an AMD Summit Ridge pre-release CPU with 8GB DDR4 RAM, Vega GPU, Windows 10 64 bit, AMD
test driver as of Dec 5, 2016. Results may vary for final product, and performance may vary based on use of latest available drivers. VG-4
Geometry throughput slide: Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons
per clock with 4 geometry engines. This represents an increase of 2.6x. VG-3
CU vs Vega NCU slide: Discrete AMD Radeon™ and FirePro™ GPUs based on the Graphics Core Next architecture consist of multiple discrete execution engines known as a Compute Unit (“CU”). Each CU
contains 64 shaders (“Stream Processors”) working together. GD-78

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version
changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or
otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT,
SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2016 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, CrossFire, FreeSync, Radeon and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States
and/or other jurisdictions. DirectX is a registered trademark of Microsoft Corporation in the US and other jurisdictions. 3DMark is a registered trademark of the Futuremark corporation.
PCIe is a registered trademark of PCI-SIG Corporation. Vulkan and the Vulkan logo are trademarks of Khronos Group Inc. Other names are for informational purposes only and may be trademarks of their
respective owners. DOOM® images and logos © 2016 Bethesda Softworks LLC, a ZeniMax Media company. DOOM and related logos are registered trademarks or trademarks of id Software LLC in the U.S.
and/or other countries. All Rights Reserved.
Deus Ex: Mankind Divided™ images and logos © 2016 Square Enix Ltd. All Rights Reserved Deus Ex: Mankind Divided, Square Enix and Eidos are trademarks of the Square Enix Group.
DISCLAIMERS & ATTRIBUTIONS

First Look at AMD Vega GPU Architecture

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to First Look at AMD Vega GPU Architecture

Similar to First Look at AMD Vega GPU Architecture (20)

More from inside-BigData.com

More from inside-BigData.com (20)

Recently uploaded

Recently uploaded (20)

First Look at AMD Vega GPU Architecture