2. REYES BIRTH
rendering algorithm developed by Robert L. Cook, Loren Carpenter, Edwin Catmull at Lucasfilm’s
research group
name: Renders Everything You Ever Saw
made with goal of film-quality CGI: smooth surfaces, shading, depth of field and motion blur at a
reasonable speed
raytracing was slow, memory consumption and execution time were too big
first used in the movie Star Trek II: The Wrath of Khan (1982)
long time used as a primary method for offline rendering (Pixar was leading in this area)
first feature-length computer animated movie: Toy Story (1995)
3. REYES FEATURES
rasterization-based algorithm with adaptive tessellation for rendering analytic surfaces with support for
high-quality camera effects via stochastic sampling
features:
analytic surfaces (parametric shapes, Catmull-Clark subdivision surfaces, Bezier patches, NURBS…)
programmable shaders (usually Renderman interface compliant)
stochastic rasterization (antialiasing, depth of field, motion blur)
order independent transparency (A-buffer)
parallelizable (SIMD, multicore, network distributed…)
5. DATA STRUCTURES
Primitive/Shape
represents analytic shape (mathematically described geometrical
surface using UV parameters or similar)
Grid/Microgrid
grid of (displaced and shaded) vertices after the primitive was
diced
A-buffer
pixel buffer that holds per-sample list of values for each sample
location
struct Vertex
{
RGBA color;
Vector3 position;
}
typedef Vertex[GRID_WIDTH][GRID_HEIGHT] Grid;
struct PixelSample
{
RGBA color;
float depth; // z-coord
PixelSample* pNext;
}
typedef PixelSample[RES_X][RES_Y][SAMPLES_PER_PIXEL] ABuffer;
class Shape
{
virtual Vector3 getNormal(UV coord);
virtual Vector3 getPosition(UV coord);
}
6. TESSELLATION - CONCEPTUALLY
the idea of the tessellation phase is to adaptively tessellate the
shape into micropolygons that are all approximately less than a
pixel (area < 1px)
tessellation is done in three steps
shape’s size is estimated in screen-space (BOUND)
if (size of the shape is “small enough” so that, after the dicing,
micropolygons in the grid are smaller than the pixel)
dice the shape (DICE)
continue onto next phase (shading and sampling)
else
split the shape somehow into smaller sub-shapes (SPLIT) and put each of
them into queue for further tessellation
BOUND
DICE
SMALL ENOUGH ? SPLIT
Shape
Shape
Microgrid
Shape
AABB Shape [ ]
7. BOUND
estimate the screen-space size of the shape
calculate the scree-space AABB and compare to some predefined threshold
bound phase should take care about the vertex displacement after the dice phase
bounding the shape:
coarsely dice the shape
displace the vertices in the microgrid
project the vertices
using image properties, estimate the pixel-sized bounding box
8. SPLIT
splitting the analytic surfaces is usually very lightweight process, and it involves creation of a new shapes
that just get the sub- intervals of the parametric surfaces that define them
determining the good split direction is important
✓✗
9. DICE
this step requires microgrid with an array of vertices that are sampled uniformly across the shape’s
surface and whose values are evaluated using the shape’s information
if we already diced the shape enough in the bound phase for virtual micropolygons of the microgrid to
be approximately less than a pixel area, then we can go straight to the next phase
Dice
(evaluate vertices in the grid)
Displace
(run “vertex shader”)
11. “ “
virtual micropolygons
SHADING
for each vertex in the microgrid, a coloring shader is executed, outputting the color of the vertex
this is where all the programmable features of the shading system come into play
logically, this is also where “primitive assembly” happens, because further stages will not look at one
vertex at the time, but rather at the micropolygon as a whole, with its corresponding vertices that make
it
micropolygons can be either triangles, quads or polygons
(basic geometrical primitives that can actually be rasterized)
12. SAMPLING
each micropolygon is rasterized
color is interpolated between the vertices of the micropolygon at each sample point within each
partially-covered pixel
number of sampling locations and positioning within the pixel can be chosen to best fight aliasing and
to give
high-quality results for the depth of field and motion blur effects
Sample pattern Sampling the color
13. SAMPLING
stochastic sampling is done at predefined sample locations but with added random jitter
for the depth of field effect, the CoC is calculated and then the sampled color is spread across the
sample locations in the A-buffer that would be affected by the CoC
for motion blur, all the sampling locations that would be affected by the motion of the shape are
resampled stochastically and injected into A-buffer
results of the each sampling operation are written in the A-buffer,
slowly filling it with per-sample-location list of all the samples
that affect it
14. RESOLVE
final stage is color resolve
using the A-buffer, for each pixel, mix the recorded samples
for each sample, choose a filter and combine with all the others
take care of the depth while resolving transparency
output the final color of the pixel in the framebuffer
Pixel 1 Pixel 2 Pixel 2 mix
…
17. OPTMIZATIONS
bucketing
find the “sweet spot” for performance by balancing the parameters
A-buffer ops, fast discard
dynamic dicing
parallelization (SIMD)
cache utilization
18. OVERVIEW
PROS
high-quality imagery
lens effects via stochastic sampling
antialiasing and OIT via A-buffer
flexible shading
parallelizable
CONS
cracks in geometry due to dicing at different
level for patches that are neighbors
wasteful uniform dicing for vey skewed patches
huge memory consumption
requires additional post-processing for GI
effects
“deprecated”
19. REYES TODAY
smooth surfaces are no longer exclusive to Reyes, but Bezier patches and subdivision surfaces can be
raytraced via Bezier clipping and similar techniques (implemented in RIS, but Reyes-based algorithms
still exist)
pathtracing using robust methods such as VCM seem like a future of high-quality CGI
Reyes-like approach towards solving problems adaptively and in tiles are still in use today:
desktop GPUs optimize much of the shading and culling through tiling (also PowerVR and mobile GPUs)
object-space, adaptive, decoupled deferred shading and shading reuse techniques for real-time
screen-space tessellation is reconsidered for use with new APIs (Vulkan, DX12)
Reyes might not be used actively, but lessons learned from it can still be applied in many formats today
20. CODE
check out https://github.com/AbstractAlgorithm/reyes/tree/dev repository for example
C++ implementation of Reyes architecture (currently w/o stochastic sampling)
see repo’s description for entire list of features
pipeline, with bound and cull test: pipeline.hpp
Shape structure: shape.hpp
Microgrid structure with AABB function used for bound phase: grid.hpp
dice: dice function
sampling that utilizes OpenGL’s rasterization with simple Z-buffer: rasterize function
more to come…
21. REFERENCES
“The Reyes Rendering Architecture”, Alexander Boswell (part 1, part 2)
“Implementing a modern, RenderMan compliant, REYES renderer”, Davide Pasca (slideshare)
“Reyes architecture and Implementation”, Kayvon Fatahalian
“A real-time micropolygon rendering”, Kayvon Fatahalian
“Parallel micropolygon rendering”, course material (pdf slides)
“Real-time Reyes-style Adaptive Surface Subdivision”, Patney and Owens
“RenderAnts: Interactive Reyes rendering on GPUs”, Kun Zhou et al.
additional resources at github.com/AbstractAlgorithm/reyes
I borrowed some of the images from Davide Pasca’s slides, Alexander Boswell’s blog and Scratchapixel
website, it’s purely for educational purpose
People that developed Reyes later founded Pixar.Reyes was developed as more reasonable alternative than raytracing, at the time. It was used as Pixar’s main rendering algorithm for a long time, and it still exists today, although now pathtracing is used as a primary method because of the hardware strength and the newer techniques that were developed that aid pathtracing smooth surfaces.
Reyes is a rasterization algorithm. But since it’s input is analytic geometry, the algorithm tessellates the input geometry until it reaches a certain threshold, whose pieces will then be shaded. Utilizing stochastic sampling, algorithm is able to achieve some very good effects that battle aliasing well and produce high-quality lens effects such as depth of field (defocus) and motion blur. To support transparency, a A-buffer is used as a data structure to hold the shading data. Basic setup of the algorithm makes it very parallelizable and can be distributed amongst different type of systems that are doing the execution.
Tessellation phase: bound, split, dice
Rasterization: shading, sampling
Resolve: accumulation of the data
Primitive is the main geometrical input to the Reyes renderer. It is the main way to describe a scene.
Grid serves as a intermediate geometrical representation suitable for later rasterization. Grid itself provides the information about the polygons from which the diced geometry is made. Those tiny polygons are called – micropolygons, and thus the name – microgrid.
A-buffer is an advanced version of the pixel storage buffer. Unlike a very common Z-buffer that usually holds only one sample per-pixel (with very limited support for transparency, optional multisapling and only stores the sample with least depth aka closest to the camera), this buffer is made to support all the needs that a renderer might have. Multiple samples per pixel ensure good antialiasing, while the list of all the samples from that location is useful for order independent transparency calculations in the final phase of the renderer when the image is being resolved.
Bound is used in determining whether the shape needs more fine-grained tessellation or not. To estimate that, we calculate approximate image are that the shape covers. That area will be compared to the microgrid’s resolution that will be used during the dice, and if dicing such shape produces micropolygons that are approximately <1px, then the shape is good to go.Shape can have many bumps over it, and so, to account for those, we need to dice the shape into some number of vertices and use all of them to calculate the bounding box.Less coarse, means more fine-grained. More fine-grained usually means better approximation of the displaced shape. However, being too detailed might lead to big memory consumption and too much time spent on that step. It requiresbalance. Think of this problem like you’re trying to determine the integral of a function. The more you split the function, the more precise you’ll be. But being precise comes at the cost of memory consumption and execution time. Another tricky part is that besides the surface of the shape itself, additional displacement of the vertices can happen from within the shader. So, besides just dicing the shape, we also need to run the displacement program (aka vertex shader) on the vertices to get a good guess where and how big the shape will end up being.
Splitting usually involves determining the subinterval that the newly created sub-shapes will have after the split.Splitting can be done on both axes at the same time, or one at the time. If later is used, then it is important to take care about which axis is chosen for the split, because the wrongly chosen axis might not help shrinking the bounding box as well as the other axis would do. Certain types of geometry are very hard to find the best splitting axis.
CoC – circle of confussion
On left – shading and sampling
On right – resolve phase
On left – tessellation phase with dice, displace, bound and split sub-stages
On right – rasterization phase (with shading and sampling sub-stages) and resolve phase
Bucketing splits the screen into smaller regions. Each section has A-buffer will that is much smaller and can fit into memory easily. In the case of a 1920x1080 A-buffer, with 64spp, where each sample has at least (R8G8B8A8 + D32F) and with variable number of samples in the sample list, the A-buffer can have 10GB+. So bucketing solves this problem by dividing the job into smaller chunks. Problem is when patches overlap several neighboring buckets, then some parts need to be re-done over again.
Balancing is crucial. Choosing the good dicing parameters and threshold for splitting are responsible for good performance and memory consumption. This becomes even more important on GPU implementations of the Reyes.
Many A-buffer ops can be optimized, as well as the access and the layout of the A-buffer itself. Tiling the buffer at different levels of the granularity may come beneficial, but also introduces need for more memory and execution time.
Dynamic dicing and smart splitting are to be considered. Many of the patches are very skewed on the screen, and thus dicing them uniformly along both U and V axis is wasteful.
Many parts of the Reyes pipeline are easily parallelizable, but extra care must be taken during the memory management so there’s a good balance and that there’s not “thrashing” of the jobs.
Jobs on micropolygons from the same grid are identical so besides SIMD nature that can be utilized, cache coherency can be exploited well. Again, biggest problem is memory consupmption.
OIT – order independent transparency
Cracks are a known flaw in many of the tessellation algorithms, and because the levels of tessellation are not uniform amongst patches that share edges, discontinuities can appear. They manifest as holes and unaligned edges. To address this problem, additional processing during the dicing must happen, or extra post-processing that will stich these cracks with additional geometry.
As mentioned, uniform dicing is usually wasteful and doesn’t not correspond well with on-screen coverage.
Dicing creates hundreds of micropolygons for each of the thousands or even millions of patches. This requires good memory management and smart balancing.
Reyes doesn’t support GI effects per-se. Any such effects need to be injected with post-processing or during the shading by providing additional information about the scene. In this manner, Reyes is very much like rasterization seen in games.
More robust GI algorithms are developed and have all the benefits that Reyes once had exclusively for itself, so Reyes architecture is slightly “deprecated”.
Direct Ray Tracing of Full-Featured Subdivision Surfaces with Bezier Clipping, Tejima at al. (http://jcgt.org/published/0004/01/04/)
Decoupled Deferred Shading for Hardware Rasterization, Gabor Liktor , Carsten Dachsbacher (http://cg.ivd.kit.edu/english/ShadingReuse.php)
Oxide games is working on a DX12-based Reyes-like renderer with decoupled rasterization:
“While forward and deferred rendering have made huge advancements over the last decade, there are still key rendering issues that are difficult to address. Among them, are arbitrary material layering, decoupling shading rate from rasterization, and shader anti-aliasing. Object space lighting is a technique inspired by film rendering techniques like REYES. By reversing the process and shading as early as possible and not in rasterization space, we can achieve arbitrary material layering, shader anti-aliasing, decoupled shading rates, and many more effects, all in real-time.Read more: http://wccftech.com/ashes-singularity-dev-addresses-dx12-object-space-rendering-gdc-2016/”Oxide games will have a presentation about their system @GDC2016.