SlideShare a Scribd company logo
1 of 35
Download to read offline
OPTIMIZING RAYTRACING
ON GCN WITH AMD
DEVELOPMENT TOOLS
TZACHI COHEN
NOVEMBER 2013
AGENDA
Overview of Raytracing & KD Trees

Review of GCN Architecture

Mapping Raytracing to GPUs

Optimizing Raytracing using CodeXL

2 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
Overview Of
Raytracing
ACCELERATION STRUCTURES TRADE OFFS

 Construction
Speed
Uniform Grid

Bounding
Volume
Hierarchies

KD Tree

 Tracing Speed

4 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HIERARCHICAL KD TREE – 2D
F

A
A
B

C
B

D

E

F

E

G

C

D
5 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

G
KD TREE – 3D

6 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
STACK BASED TRAVERSAL KD TREE – 2D
tMin

F

A

A
B

C
B

D

E

F

E

G

t2

G
C

t1

D
7 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

tMax
TRAVERSING KD TREES – PSEUDO CODE
stack.push(KDroot,sceneMin,sceneMax)
tHit=infinity
while !(stack.empty()):
(node,tStart,tEnd)=stack.pop()
while !(node.isLeaf()):
tSplit = ( node.value - ray.origin[node.axis] ) / ray.direction[node.axis]
(near, far) = findNear(ray.origin[node.axis], node.left, node.right)
if( tSplit >= tEnd or tSplit < 0)
node=near
else if( tSplit <= tStart)
node=second
else
stack.push( far, tSplit, tEnd)
node=near
tEnd=tSplit
for prim in node.primitives():
tHit=min(tHit,prim.Intersect(ray))
if tHit<tEnd:
return tHit
return tHit

8 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
GCN
ARCHITECTURE
 First introduced with the “Southern Island” family of GPUs.
 Is available with the upcoming “Kaveri” APU.
 Scalar architecture.
 ECC support. (with some models).
 Double precision support.
 Multiple concurrent queues for compute.

10 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
GPU SCALAR ARCHITECTURE VS CPU SSE EXTENSIONS
 float x;
 X = x+1;
 Scalar code does not utilize the SSE capabilities of the CPU.
 Thread 1

 Thread 2

 Thread 3

 Thread 4

 Thread 5

 Thread 6

 Thread 7

 Thread 8

 Thread 9

 Thread 10

 Thread 11

 Thread 12

 Thread 13

 Thread 14

 Thread 15

 Thread 16

11 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HOW SCALAR CODE IS EXECUTED
 float x;
 X = x+1;

GCN
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16

12 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
IMPLICATIONS FOR RAY TRACING
 Ray Packetization – having a single thread trace several rays in one KD tree
traverse to achieve better utilization of the SIMD and cache.
 No explicit ray packetization is required on GCN.
 The HW is implicitly packetizing every 64 threads. All 64 threads of a Wavefront
execute the same instruction together.

13 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
A SEQUENCER FOR EVERY COMPUTE UNIT

SQ

SQ

SQ

SQ

Compute
Unit

Compute
Unit

Compute
Unit

Compute
Unit

 A sequencer is a HW block responsible for issuing program instructions.
 A compute unit can run up to 40 Wavefronts each with a distinct program
counter.
 GPU under-utilization due to long traversing rays may happen only on the
Wavefront level.
14 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HOW MUCH ON CHIP MEMORY DO WE HAVE?
HD 7970 – “Tahiti”
256 KB VGPR per CU X 32 = 8.192 MB
8 KB SGPR per CU X 32 = 0.256 MB
16 KB L1 V-Data cache per CU X 32 = 0.512 MB
16 KB L1 S-Data cache per 4 CUs X 8 = 0.128 MB
32 KB instruction cache per 4 CUs X 8 = 0.256 MB
L2 Data Cache = 768 KB
LDS 64KB per CU X32 = 2.048 MB

Total : 12.16 MB

15 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
AMD CODE XL
 Coherent, innovative and unified developer tools suite
‒ Debug, Profile, and Analyze applications
‒ Support OpenCL™ and OpenGL.
‒ AMD CPUs, GPUs and APUs
‒ Standalone and integrated into Microsoft® Visual Studio®
‒ Supported on Windows® and Linux®
‒ Does not require source code modifications

16 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
BE SURE YOUR KERNEL SIZE DOES NOT EXCEED
INSTRUCTION CACHE SIZE

17 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
Mapping
Raytracing To
GPUs
HOW CAN A GPU TRAVERSE A TREE?

Node
Node
Node

Node

Node
Node

Node

 Nest all the nodes on a buffer, wrap the buffer with CL mem object.
 When using HSA we can leverage the unified memory architecture and
access the tree as-is.

19 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HOW MUCH MEMORY DO WE NEED FOR THE STACK?

Per Wave front = Maximal Depth Of the Tree X size of frame X 64 .

25 X 12 X 64 = ~19 KB
Leads to GPR spilling to local memory or low scheduling
utilization.
GPR spilling is decided upon by the OCL compiler on compile
time.
GPRs spilled to local memory are also known as Scratch
Registers.

20 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HOW TO DETECT SCRATCH REGISTERS USING CODEXL

21 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
STACKLESS TRACE – RESTART TRAVERSAL
tmin

F

A

A
B
D

C
E

F

t1
E

G

B
C

t2
t1 t2
t3

t1 tMax
t2 t3
t3 tMax
D
22 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

G

tMax
KD RESTART ALGORITHM
tStart=tEnd=sceneMin
timeHit=infinity
while (tEnd<sceneMax):
node=root
tStart=tEnd
tEnd=sceneMax
while (not node.isLeaf()):
axis = node.axis
tSplit = ( node.PlanePos - ray.origin[axis] ) / ray.direction[axis]
(near, far) = findNear(ray.origin[axis], node.left, node.right)
if( tSplit >= tEnd or tSplit <= 0)
node=near
else if( tSplit <= tStart)
node=far
else
node=near
tEnd=tSplit
for prim in node.primitives():
timeHit=min(tHit,prim.Intersect(ray))
if timeHit<tEnd:
return tHit
return tHit
23 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
EFFECT ON GPR SPILLAGE

24 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
Demo
Optimizing
Raytracing
using CodeXL
CAN THIS BE FURTHER REFINED?
 What on chip memory aren’t we using ?
LDS = Local Data Store.
Short Stack Algorithm – initialize a stack smaller than the
maximum depth of the tree. If we overflow, fall back to KDRestart algorithm.
If we place the short stack in the LDS, what should be
the depth of the “short stack”?

27 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
HOW MANY WAVEFRONTS ARE EXECUTED
CONCURRENTLY
 Use CodeXL application trace to discover how many Wavefronts are executed
concurrently with stackless traversal

28 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
OCCUPANCY GRAPHS

29 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
WHAT SHOULD BE THE SIZE OF THE SHORT STACK?

64 KB / 12 wavefronts / 64 threads / sizeof
(Frame) = 7

30 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
Demo
RESULTS
120
110
100
90
80
70
60
Full stack

stackless

short stack Short stack on
LDS

 Results are in Million rays per second on Radeon™ HD 7970.

32 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
Questions?

33 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and
typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to
product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences
between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or
otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to
time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
OpenCL™ is a trademark of Apple Inc. which is licensed to the Khronos organization. Linux™ is the trademark of Linus Torvalds.
Microsoft™ and Windows™ are the trademarks of Microsoft Corp. All other names used in this presentation are for
informational purposes only and may be trademarks of their respective owners.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR
ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO
EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM
THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of
Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be
trademarks of their respective owners.

34 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
REFERENCES
 Introduction to GCN
‒ http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf

 GCN white paper
‒ http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf

 CodeXL home page
‒ http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/

 AMD OpenCL programmers guide
‒ http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL
_Programming_Guide.pdf

35 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

More Related Content

What's hot

HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...AMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
GS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry KozlovGS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry KozlovAMD Developer Central
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
 
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...AMD Developer Central
 
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxPG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxAMD Developer Central
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 

What's hot (20)

HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
GS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry KozlovGS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
GS-4139, RapidFire for Cloud Gaming, by Dmitry Kozlov
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...
WT-4151, Efficient Delivery of 3D Web Contents with Khronos and MPEG Technolo...
 
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxPG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 

Similar to PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
 
Talk on commercialising space data
Talk on commercialising space data Talk on commercialising space data
Talk on commercialising space data Alison B. Lowndes
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1NVIDIA
 
2016 06 nvidia-isc_supercomputing_car_v02
2016 06 nvidia-isc_supercomputing_car_v022016 06 nvidia-isc_supercomputing_car_v02
2016 06 nvidia-isc_supercomputing_car_v02Carlo Nardone
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon Berlin
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報NVIDIA Japan
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...Edge AI and Vision Alliance
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyNVIDIA
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 

Similar to PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen (20)

Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
AI talk at CogX 2018
AI talk at CogX 2018AI talk at CogX 2018
AI talk at CogX 2018
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
Talk on commercialising space data
Talk on commercialising space data Talk on commercialising space data
Talk on commercialising space data
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
2016 06 nvidia-isc_supercomputing_car_v02
2016 06 nvidia-isc_supercomputing_car_v022016 06 nvidia-isc_supercomputing_car_v02
2016 06 nvidia-isc_supercomputing_car_v02
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imagination
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
GTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon ValleyGTC 2019 Keynote in Silicon Valley
GTC 2019 Keynote in Silicon Valley
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

  • 1. OPTIMIZING RAYTRACING ON GCN WITH AMD DEVELOPMENT TOOLS TZACHI COHEN NOVEMBER 2013
  • 2. AGENDA Overview of Raytracing & KD Trees Review of GCN Architecture Mapping Raytracing to GPUs Optimizing Raytracing using CodeXL 2 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 4. ACCELERATION STRUCTURES TRADE OFFS  Construction Speed Uniform Grid Bounding Volume Hierarchies KD Tree  Tracing Speed 4 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 5. HIERARCHICAL KD TREE – 2D F A A B C B D E F E G C D 5 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013 G
  • 6. KD TREE – 3D 6 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 7. STACK BASED TRAVERSAL KD TREE – 2D tMin F A A B C B D E F E G t2 G C t1 D 7 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013 tMax
  • 8. TRAVERSING KD TREES – PSEUDO CODE stack.push(KDroot,sceneMin,sceneMax) tHit=infinity while !(stack.empty()): (node,tStart,tEnd)=stack.pop() while !(node.isLeaf()): tSplit = ( node.value - ray.origin[node.axis] ) / ray.direction[node.axis] (near, far) = findNear(ray.origin[node.axis], node.left, node.right) if( tSplit >= tEnd or tSplit < 0) node=near else if( tSplit <= tStart) node=second else stack.push( far, tSplit, tEnd) node=near tEnd=tSplit for prim in node.primitives(): tHit=min(tHit,prim.Intersect(ray)) if tHit<tEnd: return tHit return tHit 8 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 10.  First introduced with the “Southern Island” family of GPUs.  Is available with the upcoming “Kaveri” APU.  Scalar architecture.  ECC support. (with some models).  Double precision support.  Multiple concurrent queues for compute. 10 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 11. GPU SCALAR ARCHITECTURE VS CPU SSE EXTENSIONS  float x;  X = x+1;  Scalar code does not utilize the SSE capabilities of the CPU.  Thread 1  Thread 2  Thread 3  Thread 4  Thread 5  Thread 6  Thread 7  Thread 8  Thread 9  Thread 10  Thread 11  Thread 12  Thread 13  Thread 14  Thread 15  Thread 16 11 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 12. HOW SCALAR CODE IS EXECUTED  float x;  X = x+1; GCN T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 12 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 13. IMPLICATIONS FOR RAY TRACING  Ray Packetization – having a single thread trace several rays in one KD tree traverse to achieve better utilization of the SIMD and cache.  No explicit ray packetization is required on GCN.  The HW is implicitly packetizing every 64 threads. All 64 threads of a Wavefront execute the same instruction together. 13 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 14. A SEQUENCER FOR EVERY COMPUTE UNIT SQ SQ SQ SQ Compute Unit Compute Unit Compute Unit Compute Unit  A sequencer is a HW block responsible for issuing program instructions.  A compute unit can run up to 40 Wavefronts each with a distinct program counter.  GPU under-utilization due to long traversing rays may happen only on the Wavefront level. 14 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 15. HOW MUCH ON CHIP MEMORY DO WE HAVE? HD 7970 – “Tahiti” 256 KB VGPR per CU X 32 = 8.192 MB 8 KB SGPR per CU X 32 = 0.256 MB 16 KB L1 V-Data cache per CU X 32 = 0.512 MB 16 KB L1 S-Data cache per 4 CUs X 8 = 0.128 MB 32 KB instruction cache per 4 CUs X 8 = 0.256 MB L2 Data Cache = 768 KB LDS 64KB per CU X32 = 2.048 MB Total : 12.16 MB 15 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 16. AMD CODE XL  Coherent, innovative and unified developer tools suite ‒ Debug, Profile, and Analyze applications ‒ Support OpenCL™ and OpenGL. ‒ AMD CPUs, GPUs and APUs ‒ Standalone and integrated into Microsoft® Visual Studio® ‒ Supported on Windows® and Linux® ‒ Does not require source code modifications 16 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 17. BE SURE YOUR KERNEL SIZE DOES NOT EXCEED INSTRUCTION CACHE SIZE 17 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 19. HOW CAN A GPU TRAVERSE A TREE? Node Node Node Node Node Node Node  Nest all the nodes on a buffer, wrap the buffer with CL mem object.  When using HSA we can leverage the unified memory architecture and access the tree as-is. 19 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 20. HOW MUCH MEMORY DO WE NEED FOR THE STACK? Per Wave front = Maximal Depth Of the Tree X size of frame X 64 . 25 X 12 X 64 = ~19 KB Leads to GPR spilling to local memory or low scheduling utilization. GPR spilling is decided upon by the OCL compiler on compile time. GPRs spilled to local memory are also known as Scratch Registers. 20 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 21. HOW TO DETECT SCRATCH REGISTERS USING CODEXL 21 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 22. STACKLESS TRACE – RESTART TRAVERSAL tmin F A A B D C E F t1 E G B C t2 t1 t2 t3 t1 tMax t2 t3 t3 tMax D 22 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013 G tMax
  • 23. KD RESTART ALGORITHM tStart=tEnd=sceneMin timeHit=infinity while (tEnd<sceneMax): node=root tStart=tEnd tEnd=sceneMax while (not node.isLeaf()): axis = node.axis tSplit = ( node.PlanePos - ray.origin[axis] ) / ray.direction[axis] (near, far) = findNear(ray.origin[axis], node.left, node.right) if( tSplit >= tEnd or tSplit <= 0) node=near else if( tSplit <= tStart) node=far else node=near tEnd=tSplit for prim in node.primitives(): timeHit=min(tHit,prim.Intersect(ray)) if timeHit<tEnd: return tHit return tHit 23 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 24. EFFECT ON GPR SPILLAGE 24 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 25. Demo
  • 27. CAN THIS BE FURTHER REFINED?  What on chip memory aren’t we using ? LDS = Local Data Store. Short Stack Algorithm – initialize a stack smaller than the maximum depth of the tree. If we overflow, fall back to KDRestart algorithm. If we place the short stack in the LDS, what should be the depth of the “short stack”? 27 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 28. HOW MANY WAVEFRONTS ARE EXECUTED CONCURRENTLY  Use CodeXL application trace to discover how many Wavefronts are executed concurrently with stackless traversal 28 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 29. OCCUPANCY GRAPHS 29 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 30. WHAT SHOULD BE THE SIZE OF THE SHORT STACK? 64 KB / 12 wavefronts / 64 threads / sizeof (Frame) = 7 30 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 31. Demo
  • 32. RESULTS 120 110 100 90 80 70 60 Full stack stackless short stack Short stack on LDS  Results are in Million rays per second on Radeon™ HD 7970. 32 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 33. Questions? 33 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 34. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. OpenCL™ is a trademark of Apple Inc. which is licensed to the Khronos organization. Linux™ is the trademark of Linus Torvalds. Microsoft™ and Windows™ are the trademarks of Microsoft Corp. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners. 34 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013
  • 35. REFERENCES  Introduction to GCN ‒ http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf  GCN white paper ‒ http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf  CodeXL home page ‒ http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/  AMD OpenCL programmers guide ‒ http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL _Programming_Guide.pdf 35 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013