SlideShare a Scribd company logo

Foveated Ray Tracing for VR on Multiple GPUs

Takahiro Harada

Foveated Ray Tracing for VR on Multiple GPUs

Foveated Ray Tracing for VR on Multiple GPUs

Takahiro Harada

Foveated Ray Tracing for VR on Multiple GPUs

Foveated Ray Tracing for VR on Multiple GPUs

1 of 29
Download to read offline
FOVEATED	
  RAY	
  TRACING	
  FOR	
  VR	
  	
  
ON	
  MULTIPLE	
  GPUS	
  
TAKAHIRO	
  HARADA,	
  AMD	
  
12/2014	
  
2	
   |	
  DEC	
  3,	
  2014	
  	
  	
  	
  
INTRO	
  
y  Ray	
  Tracing	
  +	
  Foveated	
  rendering	
  +	
  VR	
  +	
  MulGple	
  GPUs	
  ==	
  A	
  lot	
  of	
  GPU	
  compute!!	
  
y  Compute	
  fills	
  a	
  texture	
  
y  Use	
  GL/CL	
  interop	
  to	
  display	
  
3	
   |	
  DEC	
  3,	
  2014	
  	
  	
  	
  
GPU	
  RAY	
  TRACING	
  
y  Everything	
  is	
  wriWen	
  in	
  compute	
  
y  Our	
  renderer	
  is	
  100%	
  OpenCL	
  
‒ Win,	
  Linux,	
  OSX	
  
‒ GPU,	
  CPU	
  
y  High	
  quality	
  rendering	
  compared	
  to	
  raster	
  graphics	
  
4	
   |	
  DEC	
  3,	
  2014	
  	
  	
  	
  
5	
   |	
  DEC	
  3,	
  2014	
  	
  	
  	
  
GPU	
  RAY	
  TRACING	
  
y  A	
  single	
  big	
  kernel	
  
‒ Easy	
  to	
  port	
  
‒ Works	
  
y  Do	
  you	
  write	
  only	
  1	
  pixel	
  shader??	
  
y  Drawbacks	
  
‒ Performance	
  <=	
  SIMD	
  divergence,	
  GPU	
  occupancy	
  (uses	
  too	
  much	
  VGPRs)	
  
‒ Maintainability	
  
‒ Extendibility	
  
‒ Portability	
  
‒ Debug	
  
y  MulGple	
  kernel	
  implementaGon	
  
IMPLEMENTATION	
  CHOICES	
  
6	
   |	
  DEC	
  3,	
  2014	
  	
  	
  	
  
HOW	
  MANY	
  WGS	
  CAN	
  WE	
  EXECUTE	
  PER	
  SIMD	
  (AMD	
  GPU)	
  
y  10	
  wavefronts	
  (64WIs)	
  per	
  SIMD	
  is	
  the	
  max	
  
y  It	
  depends	
  on	
  local	
  resource	
  usage	
  of	
  the	
  kernel	
  
y  VGPR	
  usage	
  is	
  ofen	
  the	
  problem	
  
y  Share	
  256	
  VGPRs	
  among	
  n	
  work	
  groups	
  
‒ 1	
  wavefront,	
  256VGPRs	
  LL	
  
‒ 2	
  wavefronts,	
  128VGPRs	
  
‒ 4	
  wavefronts,	
  64VGPRs	
  J	
  
‒ 10	
  wavefronts,	
  25VGPRs	
  
y  Share	
  16KB	
  LDS	
  among	
  n	
  work	
  groups	
  
‒ 1	
  work	
  group,	
  16KB	
  LL	
  
‒ 2	
  work	
  group,	
  8KB	
  
‒ 4	
  work	
  group,	
  4KB	
  J	
  
y  VGPRs	
  
‒ Registers	
  used	
  by	
  vector	
  ALUs	
  
‒ 64KB/SIMD	
  
‒ 256	
  VGPRs/SIMD	
  lane	
  (=	
  64KB/64/4)	
  
y  LDS	
  (Local	
  data	
  share)	
  
‒ 64KB/CU	
  (CU	
  ==	
  4SIMD)	
  
‒ 32KB/SIMD	
  

Recommended

[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering WorkflowTakahiro Harada
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...Takahiro Harada
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Takahiro Harada
 
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Takahiro Harada
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Takahiro Harada
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2Guerrilla
 

More Related Content

What's hot

OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)Takahiro Harada
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)Takahiro Harada
 
Deferred shading
Deferred shadingDeferred shading
Deferred shadingFrank Chao
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 

What's hot (20)

OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
The Unique Lighting of Mirror's Edge
The Unique Lighting of Mirror's EdgeThe Unique Lighting of Mirror's Edge
The Unique Lighting of Mirror's Edge
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 

Viewers also liked

Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Takahiro Harada
 
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRaysTakahiro Harada
 
Introducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxIntroducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxTakahiro Harada
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Takahiro Harada
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Takahiro Harada
 
確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)Takahiro Harada
 
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)Takahiro Harada
 
28th CV勉強会@関東 #3
28th CV勉強会@関東 #328th CV勉強会@関東 #3
28th CV勉強会@関東 #3Hiroki Mizuno
 
20150328 cv関東勉強会 sumisumithパート_v1.3
20150328 cv関東勉強会 sumisumithパート_v1.320150328 cv関東勉強会 sumisumithパート_v1.3
20150328 cv関東勉強会 sumisumithパート_v1.3sumisumith
 

Viewers also liked (11)

Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)
 
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
 
Introducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxIntroducing Firerender for 3DS Max
Introducing Firerender for 3DS Max
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
 
確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)
 
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
 
自由なデータ
自由なデータ自由なデータ
自由なデータ
 
28th CV勉強会@関東 #3
28th CV勉強会@関東 #328th CV勉強会@関東 #3
28th CV勉強会@関東 #3
 
Introduction to Light Fields
Introduction to Light FieldsIntroduction to Light Fields
Introduction to Light Fields
 
20150328 cv関東勉強会 sumisumithパート_v1.3
20150328 cv関東勉強会 sumisumithパート_v1.320150328 cv関東勉強会 sumisumithパート_v1.3
20150328 cv関東勉強会 sumisumithパート_v1.3
 

Similar to Foveated Ray Tracing for VR on Multiple GPUs

Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019Unity Technologies
 
Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Droidcon Berlin
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity Technologies
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Slide_N
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016Alex Vlachos
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
 
Borderless Per Face Texture Mapping
Borderless Per Face Texture MappingBorderless Per Face Texture Mapping
Borderless Per Face Texture Mappingbasisspace
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...NopphawanTamkuan
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingElectronic Arts / DICE
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 

Similar to Foveated Ray Tracing for VR on Multiple GPUs (20)

Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019Custom SRP and graphics workflows - Unite Copenhagen 2019
Custom SRP and graphics workflows - Unite Copenhagen 2019
 
Android open gl2_droidcon_2014
Android open gl2_droidcon_2014Android open gl2_droidcon_2014
Android open gl2_droidcon_2014
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering Quality
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Praseed Pai
Praseed PaiPraseed Pai
Praseed Pai
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
Alex_Vlachos_Advanced_VR_Rendering_Performance_GDC2016
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
 
Borderless Per Face Texture Mapping
Borderless Per Face Texture MappingBorderless Per Face Texture Mapping
Borderless Per Face Texture Mapping
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 

Recently uploaded

K8 Meetup_ K8s secrets management best practices (Git Guardian).pdf
K8 Meetup_ K8s secrets management best practices (Git Guardian).pdfK8 Meetup_ K8s secrets management best practices (Git Guardian).pdf
K8 Meetup_ K8s secrets management best practices (Git Guardian).pdfMichaelOLeary82
 
MySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfMySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfNho Vĩnh
 
Transcript: New stores, new views: Booksellers adapting engaging and thriving...
Transcript: New stores, new views: Booksellers adapting engaging and thriving...Transcript: New stores, new views: Booksellers adapting engaging and thriving...
Transcript: New stores, new views: Booksellers adapting engaging and thriving...BookNet Canada
 
CenturyDX-IT-Company
CenturyDX-IT-CompanyCenturyDX-IT-Company
CenturyDX-IT-CompanyMustafa Kuğu
 
Correcting Common Mistakes, AsyncAwait.pptx
Correcting Common Mistakes, AsyncAwait.pptxCorrecting Common Mistakes, AsyncAwait.pptx
Correcting Common Mistakes, AsyncAwait.pptxBrandon Minnick, MBA
 
Presentation on introduction to cloud computing for gdsc info session
Presentation on introduction to cloud computing for gdsc info sessionPresentation on introduction to cloud computing for gdsc info session
Presentation on introduction to cloud computing for gdsc info sessionAku Sarma
 
Charting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine OperationsCharting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine Operationsmarketing932765
 
AI Literacy -Undergrad_Graduate Course
AI Literacy -Undergrad_Graduate CourseAI Literacy -Undergrad_Graduate Course
AI Literacy -Undergrad_Graduate Coursetadimalla sri yeswanth
 
Migrating to Jakarta EE 10
Migrating to Jakarta EE 10Migrating to Jakarta EE 10
Migrating to Jakarta EE 10Josh Juneau
 
DA Holiday Office Parties.pptx
DA Holiday Office Parties.pptxDA Holiday Office Parties.pptx
DA Holiday Office Parties.pptxeg3000
 
Beyond Cryptojacking: studying contemporary malware in the cloud
Beyond Cryptojacking: studying contemporary malware in the cloudBeyond Cryptojacking: studying contemporary malware in the cloud
Beyond Cryptojacking: studying contemporary malware in the cloudMattMuir5
 
Wandavision opening sequence and analysis
Wandavision opening sequence and analysisWandavision opening sequence and analysis
Wandavision opening sequence and analysis17pmat213
 
2024 Solution Challenge_ Participant Guide.pptx
2024 Solution Challenge_ Participant Guide.pptx2024 Solution Challenge_ Participant Guide.pptx
2024 Solution Challenge_ Participant Guide.pptxHarshil Gupta
 
skytechly - Technologies.pdf
skytechly - Technologies.pdfskytechly - Technologies.pdf
skytechly - Technologies.pdfmadeehaattitude
 
Christmas Coding
Christmas CodingChristmas Coding
Christmas CodingGDSC PJATK
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformancePrecisely
 
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALORE
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALOREEFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALORE
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALOREAyesha Ali
 
Journey to Google
Journey to GoogleJourney to Google
Journey to GoogleGDSC PJATK
 
The Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThe Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThousandEyes
 

Recently uploaded (20)

K8 Meetup_ K8s secrets management best practices (Git Guardian).pdf
K8 Meetup_ K8s secrets management best practices (Git Guardian).pdfK8 Meetup_ K8s secrets management best practices (Git Guardian).pdf
K8 Meetup_ K8s secrets management best practices (Git Guardian).pdf
 
MySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfMySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdf
 
Transcript: New stores, new views: Booksellers adapting engaging and thriving...
Transcript: New stores, new views: Booksellers adapting engaging and thriving...Transcript: New stores, new views: Booksellers adapting engaging and thriving...
Transcript: New stores, new views: Booksellers adapting engaging and thriving...
 
CenturyDX-IT-Company
CenturyDX-IT-CompanyCenturyDX-IT-Company
CenturyDX-IT-Company
 
Correcting Common Mistakes, AsyncAwait.pptx
Correcting Common Mistakes, AsyncAwait.pptxCorrecting Common Mistakes, AsyncAwait.pptx
Correcting Common Mistakes, AsyncAwait.pptx
 
Presentation on introduction to cloud computing for gdsc info session
Presentation on introduction to cloud computing for gdsc info sessionPresentation on introduction to cloud computing for gdsc info session
Presentation on introduction to cloud computing for gdsc info session
 
Charting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine OperationsCharting New Waters: Data Integration Excellence for Port & Marine Operations
Charting New Waters: Data Integration Excellence for Port & Marine Operations
 
AI Literacy -Undergrad_Graduate Course
AI Literacy -Undergrad_Graduate CourseAI Literacy -Undergrad_Graduate Course
AI Literacy -Undergrad_Graduate Course
 
Migrating to Jakarta EE 10
Migrating to Jakarta EE 10Migrating to Jakarta EE 10
Migrating to Jakarta EE 10
 
DA Holiday Office Parties.pptx
DA Holiday Office Parties.pptxDA Holiday Office Parties.pptx
DA Holiday Office Parties.pptx
 
Beyond Cryptojacking: studying contemporary malware in the cloud
Beyond Cryptojacking: studying contemporary malware in the cloudBeyond Cryptojacking: studying contemporary malware in the cloud
Beyond Cryptojacking: studying contemporary malware in the cloud
 
Wandavision opening sequence and analysis
Wandavision opening sequence and analysisWandavision opening sequence and analysis
Wandavision opening sequence and analysis
 
2024 Solution Challenge_ Participant Guide.pptx
2024 Solution Challenge_ Participant Guide.pptx2024 Solution Challenge_ Participant Guide.pptx
2024 Solution Challenge_ Participant Guide.pptx
 
skytechly - Technologies.pdf
skytechly - Technologies.pdfskytechly - Technologies.pdf
skytechly - Technologies.pdf
 
Bespoke Balajisms
Bespoke BalajismsBespoke Balajisms
Bespoke Balajisms
 
Christmas Coding
Christmas CodingChristmas Coding
Christmas Coding
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
 
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALORE
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALOREEFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALORE
EFFECTIVENESS OF DIGITAL PAYMENTS IN RURAL BANGALORE
 
Journey to Google
Journey to GoogleJourney to Google
Journey to Google
 
The Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and TakeawaysThe Top Outages of 2023: Analysis and Takeaways
The Top Outages of 2023: Analysis and Takeaways
 

Foveated Ray Tracing for VR on Multiple GPUs

  • 1. FOVEATED  RAY  TRACING  FOR  VR     ON  MULTIPLE  GPUS   TAKAHIRO  HARADA,  AMD   12/2014  
  • 2. 2   |  DEC  3,  2014         INTRO   y  Ray  Tracing  +  Foveated  rendering  +  VR  +  MulGple  GPUs  ==  A  lot  of  GPU  compute!!   y  Compute  fills  a  texture   y  Use  GL/CL  interop  to  display  
  • 3. 3   |  DEC  3,  2014         GPU  RAY  TRACING   y  Everything  is  wriWen  in  compute   y  Our  renderer  is  100%  OpenCL   ‒ Win,  Linux,  OSX   ‒ GPU,  CPU   y  High  quality  rendering  compared  to  raster  graphics  
  • 4. 4   |  DEC  3,  2014        
  • 5. 5   |  DEC  3,  2014         GPU  RAY  TRACING   y  A  single  big  kernel   ‒ Easy  to  port   ‒ Works   y  Do  you  write  only  1  pixel  shader??   y  Drawbacks   ‒ Performance  <=  SIMD  divergence,  GPU  occupancy  (uses  too  much  VGPRs)   ‒ Maintainability   ‒ Extendibility   ‒ Portability   ‒ Debug   y  MulGple  kernel  implementaGon   IMPLEMENTATION  CHOICES  
  • 6. 6   |  DEC  3,  2014         HOW  MANY  WGS  CAN  WE  EXECUTE  PER  SIMD  (AMD  GPU)   y  10  wavefronts  (64WIs)  per  SIMD  is  the  max   y  It  depends  on  local  resource  usage  of  the  kernel   y  VGPR  usage  is  ofen  the  problem   y  Share  256  VGPRs  among  n  work  groups   ‒ 1  wavefront,  256VGPRs  LL   ‒ 2  wavefronts,  128VGPRs   ‒ 4  wavefronts,  64VGPRs  J   ‒ 10  wavefronts,  25VGPRs   y  Share  16KB  LDS  among  n  work  groups   ‒ 1  work  group,  16KB  LL   ‒ 2  work  group,  8KB   ‒ 4  work  group,  4KB  J   y  VGPRs   ‒ Registers  used  by  vector  ALUs   ‒ 64KB/SIMD   ‒ 256  VGPRs/SIMD  lane  (=  64KB/64/4)   y  LDS  (Local  data  share)   ‒ 64KB/CU  (CU  ==  4SIMD)   ‒ 32KB/SIMD  
  • 7. 7   |  DEC  3,  2014         GPU  RAY  TRACING    launch(  RayTraceKernel  );     __kernel  void  RayTraceKernel();               Host  Code   Device  Code          launch(  PrimaryRayGenKernel  );          while(1)          {                  launch(  TraceKernel  );                  if(  !any(  hits  )  )                          break;                  launch(  SampleLightKernel  );                  launch(  TraceKernel  );                  launch(  AccumulateDIKernel  );                  launch(  SampleNextRayKernel  );          }     __kernel  void  PrimaryRayGenKernel()   __kernel  void  TraceKernel()   __kerenl  void  SampleLightKernel()     Single  kernel  implementa?on   Mul?ple  kernel  implementa?on  
  • 8. 8   |  DEC  3,  2014         RAY  TRACING  +  VR   y  Ray  tracing  is  flexible   y  Raster  graphics,  single  proj  matrix   y  Can  cast  rays  to  arbitrary  direcGon   y  Easy  to  set  up  VR   y  But  performance  isn’t  good  enough   y  ComputaGon  cost   ‒ Scene  complexity   ‒ #  of  samples  (rays)   Fully  ray  traced  but  using  baked  textures:)  
  • 9. 9   |  DEC  3,  2014         RAY  TRACING  +  VR   y  Ray  tracing  is  flexible   y  Raster  graphics,  single  proj  matrix   y  Can  cast  rays  to  arbitrary  direcGon   y  Easy  to  set  up  VR   y  But  performance  isn’t  good  enough   y  To  speed  it  up,     ‒ Reduce  #  of  pixels  to  be  shaded   y  Pixel  shading  (sample)  reducGon   ‒ Sample  reuse  (lef&right)   ‒ Foveated  rendering   Fully  ray  traced  but  using  baked  textures:)  
  • 10. 10   |  DEC  3,  2014         SAMPLE  REUSE  
  • 11. 11   |  DEC  3,  2014         FOVEATED  RENDERING   y  We  can  only  see  clearly  where  we  are  looking  at   y  Shading  at  full  rate  everywhere  is  a  waste  of   computaGon   y  Steps   ‒ Create  a  density  map   ‒ Ray  trace  1  sample  for  each  area   ‒ Reconstruct  full  resoluGon  image  
  • 12. 12   |  DEC  3,  2014         FOVEATED  RENDERING   y  We  can  only  see  clearly  where  we  are  looking  at   y  Shading  at  full  rate  everywhere  is  a  waste  of   computaGon   y  Steps   ‒ Create  a  density  map   ‒ Ray  trace  1  sample  for  each  area   ‒ Reconstruct  full  resoluGon  image  
  • 13. 13   |  DEC  3,  2014         FOVEATED  RENDERING   y  We  can  only  see  clearly  where  we  are  looking  at   y  Shading  at  full  rate  everywhere  is  a  waste  of   computaGon   y  Steps   ‒ Create  a  density  map   ‒ Ray  trace  1  sample  for  each  area   ‒ Reconstruct  full  resoluGon  image  
  • 15. 15   |  DEC  3,  2014         1.  DENSITY  MAP  DATA  STRUCTURE   y  2  data  structures  are  precomputed   y  Array<float2>  samples(  M  )   ‒ Sample  posiGon   ‒ Normalized  coordinate  (x,  y)   y  Array<NeighborInfo>  neighborInfo(  N  )   ‒ For  frame  reconstrucGon   ‒ Sample  id[k]   ‒ Sample  weight[k]   y  #  of  pixels  :  N   y  #  of  samples:  M  
  • 16. 16   |  DEC  3,  2014         2.  ASSIGN  A  UNIQUE  INDEX  FOR  EACH  SAMPLE   y  Execute  work  item  for  each  sample  in  the  paWern   y  Check  which  sample  is  in  the  rendered  area   y  Use  atomic  Inc  to  get  a  unique  index   ‒ Count:  #  of  samples   ‒ Unique  indices   As  mulGple  samples  are  taken  for  a  render(),  unique  indices  to  idenGfy  storage  locaGon  is  necessary   0   5   7   2   10   23  Samples   Ray   Color   22   7  Count   Rendering  Area  
  • 17. 17   |  DEC  3,  2014         3.  GENERATE  PRIMARY  RAYS   y  Execute  work  item  for  each  sample  in  the  range   y  Read  sampleID   y  Read  sample  coordinates   y  Generate  a  primary  ray   y  Store  to  ray  buffer   0   5   7   2   10   23  Samples   Ray   Color   22   7  Count  
  • 18. 18   |  DEC  3,  2014         4.  RAY  TRACE   y  Execute  work  item  for  each  generated  ray   y  Trace  ray  +  Shade   0   5   7   2   10   23  Samples   Ray   Color   22   7  Count  
  • 19. 19   |  DEC  3,  2014         5.  RECONSTRUCT  FRAME  BUFFER   y  Execute  work  item  for  each  pixel   y  Weighted  blend  of  k  neighbors   y  Go  through  list  of  neighbors  and  fetch   computed  pixel  color   Input   Output  
  • 20. 20   |  DEC  3,  2014         6.  APPLY  DISTORTION  AND  RENDER  LR   y  Render  to  LR   y  Execute  work  item  for  each  pixel  in  the  frame  buffer   y  Check  if  it  is  L  or  R   y  Look  up  pixel  value   y  ChromaGc  separaGon   y  Barrel  distorGon  
  • 21. 21   |  DEC  3,  2014         RESULT   y  #  of  samples  are  reduced  to  5%  compared  to  full  rate  shading   y  Could  make  it  faster  (10~30fps)   y  SGll  not  fast  enough  for  VR   y  ReducGon  of  more  samples?  
  • 22. USING  MULTIPLE  GPUS   FOR  LATENCY  CRITICAL  APPLICATION  
  • 23. 23   |  DEC  3,  2014         HOW  TO  USE  MULTIPLE  GPUS   y  Alternate  frame  rendering   ‒ Assign  a  frame  rendering  for  a  GPU   ‒ Time  to  finish  a  frame  doesn’t  change   y  Frame  split   ‒ Split  a  frame  and  all  GPUs  work  on  the  frame   ‒ Can  reduce  the  Gme  to  finish  a  frame   y  Frame  split  is  beWer  for  our  purpose  
  • 24. 24   |  DEC  3,  2014         CHALLENGE  OF  FRAME  SPLIT   y  Load  balancing  issue   y  A  GPU  finishes  immediately,  another  might  keep  running  forever   y  Workload  of  each  pixel  can  be  different   y  Foveated  rendering  makes  it  worse   ‒ Shading  point  density  is  not  uniform  on  the  screen  
  • 25. 25   |  DEC  3,  2014         SEMI  STATIC  LOAD  BALANCING   y  Load  balancing  once  for  each  frame  rendering  step   y  Use  staGsGcs  from  previous  frame  to  load  balance   y  Start  from  even  split   y  At  each  frame   ‒ Render  the  assigned  area   ‒ Each  GPU  reports  #  of  samples  processed  and  Gme  to  complete  the  work   ‒ Compute  processing  speed  for  GPU  i,   ‒  p_i  =  n_i/t_i   ‒ If  we  use  the  perfect  load  balancing,  Gme  to  finish  the  work  is     ‒  t  =  sum  n_i  /  sum  p_i   ‒ The  work  for  GPU  i  can  process  at  t  is   ‒   n_i  =  t  p_i   ‒ Compute  next  frame  split  from  the  CDF  of  sample  distribuGon   Area   n0   n1   n2   n3   A0   A1   A2   A3   #  of  Samples  
  • 26. 26   |  DEC  3,  2014         APPLYING  TO  FOVEATED  RENDERING   y  Samples  in  the  area  of  the  frame  buffer  is  not   enough   y  Sample  in  the  other  area  is  not  in  the  GPU   memory   y  We  need  to  reconstruct  frame  buffer  from   neighbor  samples   y  Gather  samples  which  have  at  least  1  neighbor   in  the  assigned  area  
  • 27. 27   |  DEC  3,  2014         RESULT   y  More  than  60fps  on  4  GPUs   ‒ 6M  triangles   ‒ 32  shadow  rays/sample   ‒ 2  AA  rays/sample     Crytek  Sponza  (0.26M  tris)   ~12ms/frame   32  shadow  rays/sample   4x  AMD  FirePro  W9000  GPUs   Rungholt  (6.7M  tris)   ~12ms/frame   32  shadow  rays/sample   4x  AMD  FirePro  W9000  GPUs  
  • 28. 28   |  DEC  3,  2014         CLOSING  THE  TALK   y  Showed  an  example  of  rendering  pipeline  100%  wriWen  in  GPU  compute   y  Showed  how  to  extend  a  ray  tracer  for  VR   y  Showed  a  fully  manual  usage  of  mulGple  GPU   ‒ ó  Fully  automaGc  by  driver  (Crossfire)  
  • 29. 29   |  DEC  3,  2014         CLOSING  THE  TALK   y  Foveated  Real-­‐Time  Ray  Tracing  for  Virtual  Reality  Headset   y  Ray  Tracing  Irregularly  Distributed  Samples  on  MulGple  GPUs   y  hWp://research.lighWransport.com/foveated-­‐real-­‐Gme-­‐ray-­‐tracing-­‐for-­‐virtual-­‐reality-­‐headset/index.html   y  Thanks  to  Masahiro  Fujita@Light  Transport  Entertainment  Inc.