The document discusses Fabric Engine, a platform for building high-performance 3D applications optimized for parallelism. It uses the KL programming language and targets CPUs and GPUs, including new HSA GPUs. The author details how Fabric Engine allows KL code to run on GPUs via the HSA architecture with minimal changes, enabling more developers to leverage the power of GPUs for digital content creation.
HSA and Fabric Engine: Boosting 3D Content Creation Performance
1. HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
PETER
ZION
CHIEF
ARCHITECT
FABRIC
ENGINE
INC.
2. PERFORMANCE
AND
3D
y
Performance
is
very
important
for
high-‐end
3D
‒ SimulaSons:
parScles,
crowds,
materials,
hair
‒ Rendering:
scene
culling,
subdivisions,
path
tracing
y
Quality
of
3D
content
is
largely
driven
by
available
performance
2
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
3. PERFORMANCE
AND
3D
y
GPU
came
from
3D,
but
sSll
mostly
used
for
rendering
in
high-‐end
3D
content
creaSon
‒ GPU
compute
is
domain
of
“ninja
coders”
‒ SSll
o[en
done
through
“shader
hacks”!
y
Need
to
democraSze
the
GPU!
3
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
4. WHAT
IS
FABRIC
ENGINE?
y
A
high-‐performance
plaborm
for
building
3D
applicaSons,
effects
and
tools.
‒ OpSmized
naSve
code
‒ Parallelism
‒ High-‐end
3D
for
media
and
entertainment
y
ApplicaSons
can
be
standalone
and/or
embedded
in
DCCs
(Maya,
So[image,
3DSMax,
…)
4
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
5. WHAT
IS
FABRIC
ENGINE?
} Fabric
Engine
SIGGRAPH
2013
teaser
video:
hjp://vimeo.com/70421665
5
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
6. WHAT
IS
FABRIC
ENGINE?
y
ApplicaSons
are
a
combinaSon
of
Python
(or
a
DCC)
and
KL
‒ Python/DCC:
UI,
construcSon
of
3D
scenes
‒ KL:
rendering,
simulaSon,
effects
and
data
import/export
‒ Python/DCC
drives
execuSon
of
KL
code
6
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
7. HORDE
y
Horde:
High-‐End
Crowd
SimulaSon
‒ Thousands
of
interacSng
characters
‒ Rigging
(puppetry)
of
each
character
‒ Behaviour
of
characters
‒ A
typical
Fabric
Engine
applicaSon
7
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
8. THE
KL
LANGUAGE
Procedural
y JavaScript-‐like
syntax
y Rich
type
system
y
‒ Integers,
Booleans,
Floats,
Strings
‒ Fixed-‐
and
variable-‐size
arrays;
dicSonaries
‒ Structures
and
Objects
y
Pointer-‐free
8
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
9. THE
KL
LANGUAGE
y
A
simple
language
‒ Accessible
to
“technical
arSsts”
9
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
10. THE
KL
LANGUAGE
y
KL
is
built
on
LLVM
‒ Targets
many
plaborms
‒ Rich
opSmizaSons
‒ Amazing
API
y
KL
was
originally
designed
with
only
CPUs
in
mind
‒ Can
it
target
the
GPU?
10
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
11. SUPPORTING
HSA
GPUS
y
Goals
‒ Allow
most
KL
code
to
run
without
modificaSon
on
HSA
GPUs
‒ Allow
KL
code
on
CPU
to
perform
a
parallel
evaluaSon
of
other
KL
code
on
GPU
‒ Make
memory
management
as
easy
as
possible
11
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
12. SUPPORTING
HSA
GPUS
} Video
demo
of
Maya
integraSon
of
water
simulaSon
running
on
HSA
inside
Maya
12
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
13. SUPPORTING
HSA
GPUS
y
Challenges
‒ KL
runSme
library
is
C++
‒ MulSple
address
spaces
on
GPUs
‒ KL
is
high-‐level
‒ Dynamic
memory
management
‒ ExcepSons
‒ “Virtual
funcSons”
13
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
14. STAGE
ONE
Goal:
get
compiler
unit
tests
passing
on
GPU
y Convert
KL
runSme
library
to
LLVM
IR
y Support
mulSple
address
spaces
y
‒ AutomaSc
regeneraSon
of
LLVM
funcSons
for
correct
address
spaces
y
Create
HSA-‐based
test
harness
14
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
15. KL
RUNTIME
LIBRARY
y
Originally,
KL
runSme
library
was
wrijen
in
C++
‒ Not
GPU-‐compaSble
LLVM
is
very
good
at
inlining
y EnSre
runSme
library
was
converted
into
code
that
builds
LLVM
IR
y
‒ EffecSvely,
runSme
library
is
now
dynamically
compiled
‒ Very
low
level,
eg.
conversion
of
float
to
string
15
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
16. MULTIPLE
ADDRESS
SPACES
GPU
differenSates
between
pointers
to
private,
local
and
global
memory
y Rewrote
KL
code
generators
to
account
for
address
spaces
y
‒ If
same
funcSon
is
used
with
two
different
combinaSons
of
pointer
type,
funcSon
is
generated
twice
16
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
17. KL
UNIT
TESTS
KL
has
a
rich
set
of
unit
tests
(~400
tests)
y GPU
test
harness
was
easy
to
write
y
‒ HSA
runSme
API
‒ Pass
LLVM
IR
to
AMD
compiler
library
in
place
of
OpenCL
‒ Simulate
a
heap
and
“prinb”
y
A
few
HSA-‐related
problems
in
our
code
‒ Alignment,
global
iniSalizaSon,
intrinsics
17
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
18. STAGE
ONE
RESULTS
y
Vast
majority
of
KL
unit
tests
pass
on
HSA
‒ Failures
are
very
isolated
‒ eg.
unsupported
transcendentals
‒ LLVM
IR
-‐>
HSAIL
path
in
AMD
compiler
library
is
stable
18
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
19. STAGE
TWO
y
Goal:
support
trampoline
from
CPU
to
GPU
‒ Meaning:
GPU
kernel
execuSon
from
KL
code
running
on
CPU
‒ GPU-‐enable
parallel
execute
(PEX)
operaSon
y
Use
OpenGL
interop
for
direct
rendering
19
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
20. PARALLEL
EXECUTE
(PEX)
OPERATION
y
KL
parallel
PEX
primiSve
adapted
for
GPU
execuSon
‒ Simple
one-‐dimensional
parallel
call
‒ Decision
to
run
on
GPU
made
at
runSme
20
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
21. PARALLEL
EXECUTE
(PEX)
OPERATION
operator gpuKernel<<<index>>>(MyStruct myStruct) {
report(“[“ + index + “]: myStruct=“ + myStruct);
}
operator cpuKernel() {
UInt32 count = 4096;
Boolean useGPU = true;
MyStruct myStruct;
// Execute kernel 4096 times on GPU
kernel<<<count@useGPU>>>(myStruct);
}
21
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
22. PARALLEL
EXECUTE
(PEX)
OPERATION
y
KL
parallel
PEX
primiSve
adapted
for
GPU
execuSon
‒ Compiles
KL
code
to
GPU
kernel
(if
not
cached)
‒ Creates
“trampoline”
from
CPU
to
HSA
in
CPU
code
‒ Passes
arguments
to
kernel
‒ Direct
values
or
pointers
to
shared
memory
‒ Calls
HsaSubmitAql
22
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
23. MEMORY
REGISTRATION
y
HSA
runSme:
All
memory
shared
between
CPU
and
HSA
must
be
registered
‒ HsaRegisterSystemMemory
‒ For
dynamic
memory,
this
is
easy
‒ HSA
runSme
provides
a
heap!
‒ What
about
variables
allocated
on
CPU
stack?
23
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
24. MEMORY
REGISTRATION
operator cpuCode() {
UInt32 count = 4096;
Boolean useGPU = true;
MyStruct myStructOnStack;
// Execute kernel 4096 times on GPU
kernel<<<count@useGPU>>>(myStructOnStack);
}
24
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
25. MEMORY
REGISTRATION
y
SoluSon:
alternate
stack
‒ Register
stack
for
each
CPU
thread
in
HSA-‐registered
memory
‒ Every
call
to
KL
code
“trampolines”
to
registered
stack
25
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
26. DYNAMIC
MEMORY
ALLOCATION
y
KL
supports
dynamic
allocaSon
‒ Internal
to
types
(eg.
variable-‐length
arrays,
strings)
‒ HsaAllocateSystemMemory
on
CPU
‒ Well-‐known
GPU
allocaSon
algorithms
‒ eg.
ScajerAlloc
‒ What
about
mixed
allocaSon?
26
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
27. DYNAMIC
MEMORY
ALLOCATION
operator cpuKernel() {
UInt32 a[][];
a.resize(4096); // alloc CPU mem
for (Index i=0; i<4096; ++i) a.resize(i%32); // alloc CPU mem
gpuKernel<<<4096@true>>>(a);
a.clear(); // free GPU mem and CPU mem
}
operator gpuKernel<<<index>>>(UInt32 a[][]) {
a[index].resize(index%64); // free CPU mem, alloc GPU mem
}
27
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
28. DYNAMIC
MEMORY
ALLOCATION
y
How
to
manage
mixed
allocaSon?
‒ Defer
incompaSble
frees
‒ GPU
kernels
atomically
append
GPU
pointers
to
be
freed
to
a
list
‒ CPU
frees
pointers
when
kernel
finishes
‒ CPU
can
free
GPU
pointers
‒ Using
either
system
atomics
or
a
simple
mutex
28
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
29. STAGE TWO RESULTS"
For
command-‐line
tests
(eg.
naïve
matrix
mulSplies):
5x-‐15x
performance
improvement
y For
real-‐world
tests
(eg.
embedded
in
UI):
up
to
5x
performance
improvement
y 3D
effects
can
be
run
in
real-‐Sme
y
29
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
30. STAGE TWO RESULTS"
y
Paradigm
shi[
for
programmaSc
effects
‒ Technical
arSsts
can
make
run-‐Sme
changes
to
GPU
code
and
see
the
results
in
real-‐Sme
30
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL
31. ONGOING
WORK
y
OpenGL
interop
‒ Tag
KL
arrays
as
bound
to
VBOs
GPU-‐to-‐GPU
PEX
y Virtual
funcSons
on
GPU
y Debugger
for
GPU
y
31
|
HSA
AND
FABRIC
ENGINE:
A
GAME
CHANGER
FOR
DIGITAL
CONTENT
CREATION
|
NOVEMBER
19,
2013
|
CONFIDENTIAL