2. 2
|
DEC
3,
2014
INTRO
y Ray
Tracing
+
Foveated
rendering
+
VR
+
MulGple
GPUs
==
A
lot
of
GPU
compute!!
y Compute
fills
a
texture
y Use
GL/CL
interop
to
display
3. 3
|
DEC
3,
2014
GPU
RAY
TRACING
y Everything
is
wriWen
in
compute
y Our
renderer
is
100%
OpenCL
‒ Win,
Linux,
OSX
‒ GPU,
CPU
y High
quality
rendering
compared
to
raster
graphics
5. 5
|
DEC
3,
2014
GPU
RAY
TRACING
y A
single
big
kernel
‒ Easy
to
port
‒ Works
y Do
you
write
only
1
pixel
shader??
y Drawbacks
‒ Performance
<=
SIMD
divergence,
GPU
occupancy
(uses
too
much
VGPRs)
‒ Maintainability
‒ Extendibility
‒ Portability
‒ Debug
y MulGple
kernel
implementaGon
IMPLEMENTATION
CHOICES
6. 6
|
DEC
3,
2014
HOW
MANY
WGS
CAN
WE
EXECUTE
PER
SIMD
(AMD
GPU)
y 10
wavefronts
(64WIs)
per
SIMD
is
the
max
y It
depends
on
local
resource
usage
of
the
kernel
y VGPR
usage
is
ofen
the
problem
y Share
256
VGPRs
among
n
work
groups
‒ 1
wavefront,
256VGPRs
LL
‒ 2
wavefronts,
128VGPRs
‒ 4
wavefronts,
64VGPRs
J
‒ 10
wavefronts,
25VGPRs
y Share
16KB
LDS
among
n
work
groups
‒ 1
work
group,
16KB
LL
‒ 2
work
group,
8KB
‒ 4
work
group,
4KB
J
y VGPRs
‒ Registers
used
by
vector
ALUs
‒ 64KB/SIMD
‒ 256
VGPRs/SIMD
lane
(=
64KB/64/4)
y LDS
(Local
data
share)
‒ 64KB/CU
(CU
==
4SIMD)
‒ 32KB/SIMD
8. 8
|
DEC
3,
2014
RAY
TRACING
+
VR
y Ray
tracing
is
flexible
y Raster
graphics,
single
proj
matrix
y Can
cast
rays
to
arbitrary
direcGon
y Easy
to
set
up
VR
y But
performance
isn’t
good
enough
y ComputaGon
cost
‒ Scene
complexity
‒ #
of
samples
(rays)
Fully
ray
traced
but
using
baked
textures:)
9. 9
|
DEC
3,
2014
RAY
TRACING
+
VR
y Ray
tracing
is
flexible
y Raster
graphics,
single
proj
matrix
y Can
cast
rays
to
arbitrary
direcGon
y Easy
to
set
up
VR
y But
performance
isn’t
good
enough
y To
speed
it
up,
‒ Reduce
#
of
pixels
to
be
shaded
y Pixel
shading
(sample)
reducGon
‒ Sample
reuse
(lef&right)
‒ Foveated
rendering
Fully
ray
traced
but
using
baked
textures:)
11. 11
|
DEC
3,
2014
FOVEATED
RENDERING
y We
can
only
see
clearly
where
we
are
looking
at
y Shading
at
full
rate
everywhere
is
a
waste
of
computaGon
y Steps
‒ Create
a
density
map
‒ Ray
trace
1
sample
for
each
area
‒ Reconstruct
full
resoluGon
image
12. 12
|
DEC
3,
2014
FOVEATED
RENDERING
y We
can
only
see
clearly
where
we
are
looking
at
y Shading
at
full
rate
everywhere
is
a
waste
of
computaGon
y Steps
‒ Create
a
density
map
‒ Ray
trace
1
sample
for
each
area
‒ Reconstruct
full
resoluGon
image
13. 13
|
DEC
3,
2014
FOVEATED
RENDERING
y We
can
only
see
clearly
where
we
are
looking
at
y Shading
at
full
rate
everywhere
is
a
waste
of
computaGon
y Steps
‒ Create
a
density
map
‒ Ray
trace
1
sample
for
each
area
‒ Reconstruct
full
resoluGon
image
15. 15
|
DEC
3,
2014
1.
DENSITY
MAP
DATA
STRUCTURE
y 2
data
structures
are
precomputed
y Array<float2>
samples(
M
)
‒ Sample
posiGon
‒ Normalized
coordinate
(x,
y)
y Array<NeighborInfo>
neighborInfo(
N
)
‒ For
frame
reconstrucGon
‒ Sample
id[k]
‒ Sample
weight[k]
y #
of
pixels
:
N
y #
of
samples:
M
16. 16
|
DEC
3,
2014
2.
ASSIGN
A
UNIQUE
INDEX
FOR
EACH
SAMPLE
y Execute
work
item
for
each
sample
in
the
paWern
y Check
which
sample
is
in
the
rendered
area
y Use
atomic
Inc
to
get
a
unique
index
‒ Count:
#
of
samples
‒ Unique
indices
As
mulGple
samples
are
taken
for
a
render(),
unique
indices
to
idenGfy
storage
locaGon
is
necessary
0
5
7
2
10
23
Samples
Ray
Color
22
7
Count
Rendering
Area
17. 17
|
DEC
3,
2014
3.
GENERATE
PRIMARY
RAYS
y Execute
work
item
for
each
sample
in
the
range
y Read
sampleID
y Read
sample
coordinates
y Generate
a
primary
ray
y Store
to
ray
buffer
0
5
7
2
10
23
Samples
Ray
Color
22
7
Count
18. 18
|
DEC
3,
2014
4.
RAY
TRACE
y Execute
work
item
for
each
generated
ray
y Trace
ray
+
Shade
0
5
7
2
10
23
Samples
Ray
Color
22
7
Count
19. 19
|
DEC
3,
2014
5.
RECONSTRUCT
FRAME
BUFFER
y Execute
work
item
for
each
pixel
y Weighted
blend
of
k
neighbors
y Go
through
list
of
neighbors
and
fetch
computed
pixel
color
Input
Output
20. 20
|
DEC
3,
2014
6.
APPLY
DISTORTION
AND
RENDER
LR
y Render
to
LR
y Execute
work
item
for
each
pixel
in
the
frame
buffer
y Check
if
it
is
L
or
R
y Look
up
pixel
value
y ChromaGc
separaGon
y Barrel
distorGon
21. 21
|
DEC
3,
2014
RESULT
y #
of
samples
are
reduced
to
5%
compared
to
full
rate
shading
y Could
make
it
faster
(10~30fps)
y SGll
not
fast
enough
for
VR
y ReducGon
of
more
samples?
23. 23
|
DEC
3,
2014
HOW
TO
USE
MULTIPLE
GPUS
y Alternate
frame
rendering
‒ Assign
a
frame
rendering
for
a
GPU
‒ Time
to
finish
a
frame
doesn’t
change
y Frame
split
‒ Split
a
frame
and
all
GPUs
work
on
the
frame
‒ Can
reduce
the
Gme
to
finish
a
frame
y Frame
split
is
beWer
for
our
purpose
24. 24
|
DEC
3,
2014
CHALLENGE
OF
FRAME
SPLIT
y Load
balancing
issue
y A
GPU
finishes
immediately,
another
might
keep
running
forever
y Workload
of
each
pixel
can
be
different
y Foveated
rendering
makes
it
worse
‒ Shading
point
density
is
not
uniform
on
the
screen
25. 25
|
DEC
3,
2014
SEMI
STATIC
LOAD
BALANCING
y Load
balancing
once
for
each
frame
rendering
step
y Use
staGsGcs
from
previous
frame
to
load
balance
y Start
from
even
split
y At
each
frame
‒ Render
the
assigned
area
‒ Each
GPU
reports
#
of
samples
processed
and
Gme
to
complete
the
work
‒ Compute
processing
speed
for
GPU
i,
‒ p_i
=
n_i/t_i
‒ If
we
use
the
perfect
load
balancing,
Gme
to
finish
the
work
is
‒ t
=
sum
n_i
/
sum
p_i
‒ The
work
for
GPU
i
can
process
at
t
is
‒
n_i
=
t
p_i
‒ Compute
next
frame
split
from
the
CDF
of
sample
distribuGon
Area
n0
n1
n2
n3
A0
A1
A2
A3
#
of
Samples
26. 26
|
DEC
3,
2014
APPLYING
TO
FOVEATED
RENDERING
y Samples
in
the
area
of
the
frame
buffer
is
not
enough
y Sample
in
the
other
area
is
not
in
the
GPU
memory
y We
need
to
reconstruct
frame
buffer
from
neighbor
samples
y Gather
samples
which
have
at
least
1
neighbor
in
the
assigned
area
27. 27
|
DEC
3,
2014
RESULT
y More
than
60fps
on
4
GPUs
‒ 6M
triangles
‒ 32
shadow
rays/sample
‒ 2
AA
rays/sample
Crytek
Sponza
(0.26M
tris)
~12ms/frame
32
shadow
rays/sample
4x
AMD
FirePro
W9000
GPUs
Rungholt
(6.7M
tris)
~12ms/frame
32
shadow
rays/sample
4x
AMD
FirePro
W9000
GPUs
28. 28
|
DEC
3,
2014
CLOSING
THE
TALK
y Showed
an
example
of
rendering
pipeline
100%
wriWen
in
GPU
compute
y Showed
how
to
extend
a
ray
tracer
for
VR
y Showed
a
fully
manual
usage
of
mulGple
GPU
‒ ó
Fully
automaGc
by
driver
(Crossfire)
29. 29
|
DEC
3,
2014
CLOSING
THE
TALK
y Foveated
Real-‐Time
Ray
Tracing
for
Virtual
Reality
Headset
y Ray
Tracing
Irregularly
Distributed
Samples
on
MulGple
GPUs
y hWp://research.lighWransport.com/foveated-‐real-‐Gme-‐ray-‐tracing-‐for-‐virtual-‐reality-‐headset/index.html
y Thanks
to
Masahiro
Fujita@Light
Transport
Entertainment
Inc.