Today AMD unveiled preliminary details of its forthcoming GPU architecture, Vega. Conceived and executed over 5 years, Vega architecture enables new possibilities in PC gaming, professional design and machine intelligence that traditional GPU architectures have not been able to address effectively. Data-intensive workloads are becoming the new normal, and the parallel nature of the GPU lends itself ideally to tackling them. However, processing these huge new datasets requires fast access to massive amounts of memory. The Vega architecture's revolutionary memory subsystem enables GPUs to address very large data sets spread across a mix of memory types. The high-bandwidth cache controller in Vega-based GPUs can access on-package cache and off-package memories in a flexible, programmable fashion using fine-grained data movement.
Read the Full Story: http://wp.me/p3RLHQ-gbp
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
1. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Vega
AMD’s Next-Generation GPU Architecture
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
2. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
We want rich, lavish
virtual worlds.
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
3. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
We want to create with limitless detail,
in real time.
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
4. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
We want to make decisions
based on exabytes of data
in an instant.
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
5. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
GPUs are taking on more
diverse workloads
WORKSTATION
Physically Based Rendering
Physics Modeling
Loom (VR)
Hi-Res HDR Content Creation
GAMING
4K VR
Consoles
New Rendering Pipelines
New APIs
eSports
COMPUTE
Machine Learning
Image Recognition / Computer Vision
Natural Data Processing
GPU
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
6. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Conventional architectures are
not scaling to meet needs
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
7. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Game install sizes are expanding exponentially
Gigabytes
RelativeDataSize
Deus Ex Series Install Disk Size (source: Steam)
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Chart for illustrative purposes
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
8. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
The Lord of
the Rings
Fellowship of the
Ring
The Hobbit
An Unexpected
Journey
Pro graphics data sets are well into the petabytes
Petabytes
RelativeAssetSize
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Avatar
The Hobbit
The Desolation of
Smaug
The Hobbit
Battle of the Five
Armies
The BFG
See endnote for details
9. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Compute workloads have shot into the exabytes
Character Recognition Object Detection
Image Recognition
Image/Video Recognition
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
See endnote for details
Data Point Too Big to
Illustrate
Exabytes
RelativeTrainingSetSize
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
10. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Growth in processing power is outpacing
growth in memory capacity
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Relative GPU Compute (GFLOPS)
Relative GPU Storage Capacity
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST See endnote for details
11. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Introducing the world’s
most scalable GPU
memory architecture
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
12. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
High-Bandwidth Cache
High-Bandwidth
Cache
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
13. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
bandwidth per pin*
2XHBM2
*vs HBM
14. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
HBM2
Over 50% smaller footprint
HBM2 vs. GDDR5
8X Capacity/stack*
*vs HBM
15. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
High-Bandwidth
Cache Controller
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
16. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
virtual address space
512 TBHigh-Bandwidth
Cache Controller
17. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Adaptive, fine-grained
data movement
High-Bandwidth
Cache Controller
18. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Total Allocations Accessed
(Ultra 4K)
Time
(Ultra 4K)
See endnotes for details
Time
GraphicsMemory
GraphicsMemory
19. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Image from Deus Ex: Mankind Divided™ courtesy of Eidos Montreal
20. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST Image from Deus Ex: Mankind Divided™ courtesy of Eidos Montreal
21. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
New Programmable
Geometry Pipeline
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
22. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
peak throughput
per clock
2XOver
New Programmable
Geometry Pipeline
See endnotes for details
25. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Next-Generation
Compute Engine
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
26. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
I N T R O D U C I N G
VegaNCU
Next-GenerationComputeUnit
512
8-bit ops
per clock
256
16-bit ops
per clock
128
32-bit ops
per clock
Double Precision Rate is Configurable
*See endnotes for details
27. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Rapid Packed Math
Supercharges performance of emerging workloads
28. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
NCU is optimized for higher clock
speeds and higher IPC
CU*
NCU
*See endnotes for details
29. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
We’ve been working on reducing memory
bandwidth consumption for many years
Texture & Color CompressionFastZ ClearHiZ
30. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Next Generation
Pixel Engine
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
31. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Fetch once enabled by smart primitive
rasterization with on-chip bin cache
Shade once enabled by culling of pixels
invisible to final scene
Draw Stream Binning Rasterizer
Designed to improve performance and saves power
32. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
L1
Compute
Engine
Pixel
Engine
Geometry
Engine
L1
L1
L2
Memory
Controller
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
GDDR5
Legacy Architecture –
Non-coherent Pixel
and Texture Memory
Access
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
33. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Render back-ends
are now clients of
the L2 cache.
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine L1
L1
L1
L2
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
34. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
Helps improve
performance with
applications that use
deferred shading.
High-Bandwidth
Cache
HBCC
NV RAM
Network
Storage
System DRAM
Geometry
Pipeline
Compute
Engine
Pixel
Engine L1
L1
L1
L2
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
35. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
L1
Compute Engine
Pixel Engine
Geometry Pipeline
L1
L1
L2
High Bandwidth
Cache Controller
High-Bandwidth
Cache
NVRAM
Network Storage
System DRAM
CPU MM Display XDMA PCIe®
CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
36. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM ESTCONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
New Programmable Geometry
Pipeline
Revolutionary
High Bandwidth Cache
Advanced
Pixel Engine
Vega
GPU Architecturefor the Immersive and Instinctive Computing Era
Next-Gen Compute Unit
37. CONFIDENTIAL | UNDER NDA UNTIL JANUARY 5, 2017, 9AM EST
ENDNOTES
Pro graphics data set slide: Data provided by a third party studio and not verified by AMD. Data is historic - cinema asset data sizes for Lord of the Rings (2001) @.15PB, Avatar (2009) @ 1 PB, The
Hobbit –Part1 (2012) @ 1.4 PB, The Hobbit –Part2 (2013) @ 1.8 PB, The Hobbit –Part3 (2014) @ 2.3 PB, and The BFG (2016) @ 3 PB. VG-6
Compute workload data set slide: Typical word character recognition data set defined as 18.3 MB (http://wordnet.princeton.edu/wordnet/download/old-versions/ ). Object Identification datasets
defined as 490MB (http://host.robots.ox.ac.uk/pascal/VOC/databases.html#VOC2005_1). Image recognition defined as 144 GB (http://www.image-net.org/challenges/LSVRC/2010/download-all-
nonpub). Image and video recognition datasets defined as 144 GB (http://image-net.org/challenges/LSVRC/2015/) Natural Data Analysis datasets defined as 2.5QB(http://www.vcloudnews.com/every-
day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily/). VG-7
Growth in processing power slide: Data based on historic product specs; GPU relative frame buffer size vs relative TFLOP capability. The ATI Radeon 9700 Pro was 0.026 TFLOPs with 128 MB
framebuffer. The ATI Radeon X950 XT was 0.08 TFLOPs with 256 MB framebuffer. The ATI Radeon X1900 XT was 0.375 TFLOPs with 512 MB framebuffer. The ATI Radeon HD 2900 XT was 0.4755
TFLOPs with 512 MB framebuffer. The ATI Radeon HD 4870 XT was 1.2 TFLOPs with 512 MB framebuffer. The ATI Radeon HD 5870 was 1.2 TFLOPs with 512 MB framebuffer. The AMD Radeon HD
7970 was 3.79 TFLOPs with 3 GB framebuffer. The AMD Radeon R9 290X was 5.63 TFLOPs with 4 GB framebuffer. The AMD Radeon R9 Fury X was 8.6 TFLOPs with 4 GB framebuffer. VG-5
Witcher 3 and Fallout 4 data slide: Data based on AMD Internal testing of an early Vega sample using an AMD Summit Ridge pre-release CPU with 8GB DDR4 RAM, Vega GPU, Windows 10 64 bit, AMD
test driver as of Dec 5, 2016. Results may vary for final product, and performance may vary based on use of latest available drivers. VG-4
Geometry throughput slide: Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons
per clock with 4 geometry engines. This represents an increase of 2.6x. VG-3
CU vs Vega NCU slide: Discrete AMD Radeon™ and FirePro™ GPUs based on the Graphics Core Next architecture consist of multiple discrete execution engines known as a Compute Unit (“CU”). Each CU
contains 64 shaders (“Stream Processors”) working together. GD-78