Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree Ray Tracing Kernels
Sven Woop
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF
SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS
OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR
PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE
OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND
AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE
ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH
MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL
PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications.
Current characterized errata are available on request.
Optimized Intel® HD Graphics P3000 only available on select models of the Intel® Xeon® processor E3 family. To learn more about Intel Xeon processors for workstation
visit www.intel.com/go/workstation.
HD Graphics P4000 introduces four additional execution units, going from 8 in the HD P3000 to 12 in the HD P4000. Optimized Intel® HD Graphics P4000 only available on
select models of the Intel® Xeon® processor E3-1200 v2 product family. For more information, visithttp://www.intel.com/content/www/us/en/architecture-and-
technology/hdgraphics/hdgraphics-developer.html
Iris™ graphics is available on select systems. Consult your system manufacturer.
Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and
other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code
names is at the sole risk of the user.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record
product roadmaps.
Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that
product when combined with other products. For more information go to : http://www.Intel.com/performance
Legal
8/18/2015
2
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY
EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance
tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the
U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use
with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer and Optimization Notice
3
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Embree Overview
 Embree Performance
 Embree API
 Catmull Clark Subdivision Surfaces
Outline
8/18/2015
4
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree Overview
8/18/2015
6
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.8/18/2015
• Movie industry transitioning to ray tracing
(better image quality, faster feedback)
• High quality rendering for commercials, prints, etc.
• Provides higher fidelity for virtual design
(automotive industry, architectural design, etc.)
• Various kind of simulations
(lighting, sound, particles, collision detection, etc.)
• Prebaked lighting in games
• etc.
Usage of Ray Tracing Today
7
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Need to multi-thread: easy for rendering but difficult for hierarchy
construction
 Need to vectorize: efficient use of SIMD units, different ISAs (SSE,
AVX, AVX2, AVX-512, KNCNI)
 Need deep domain knowledge: many different data structures
(kd-trees, octrees, grids, BVH2, BVH4, ..., hybrid structures) and
algorithms (single rays, packets, large packets, stream tracing, ...) to
choose
 Need to support different CPUs: Different ISAs/CPU types favor
different data structures, data layouts, and algorithms
Writing a Fast Ray Tracer is Difficult
8/18/2015
8
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Observations
8/18/2015
 Ray tracers are often not sufficiently optimized
 Ray traversal consumes a lot of cycles of renderer
(often over 70%)
 Ray tracing can be expressed by small number of
commonly used operations (build and traversal)
 Ray tracing kernel library has potential to speed up
many rendering applications
9
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Provides highly optimized and scalable Ray Tracing
Kernels (data structure build and ray traversal)
 Highest ray tracing performance
(1.5x – 6x speedup reported by users)
 Support for latest CPUs (e.g. AVX512 support)
 Targets application developers in professional
rendering environment
 API for easy integration into applications
 Free and Open Source under Apache 2.0 license
(http://embree.github.com)
Embree Ray Tracing Kernels
8/18/2015
10
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Find closest and any hit kernel (rtcIntersect, rtcOccluded)
 Single Rays and Ray Packets (4, 8, 16)
 High quality and high performance hierarchy builders
 Intel® SPMD Program Compiler (ISPC) supported
 Triangles, Instances, Hair, Catmull Clark Subdivision
Surfaces, Displacement Mapping
 Extensible (User Defined Geometry, Intersection filter
functions, Open Source)
 SSE, AVX, AVX2, and AVX512 support
Embree Features
8/18/2015
11
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Catmull Clark Subdivision Surfaces
– Smooth surface primitive
 Vector Displacement Mapping
– Add geometric detail
 Initial AVX512 support
– 16 wide AVX512 traversal kernels
New Embree Features
8/18/2015
12
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree System Overview
8/18/2015
13
Embree API (C++ and ISPC)
Ray Tracing Kernel Selection
Accel. structure
bvh4.triangle4,
bvh8.triangle8,
bvh4aos.triangle1,
bvh4.grid
…
Builders
SAH builder
Spatial split builder
Morton code
builder
BVH Refitter
Traversal
Single ray (SSE2,
AVX, AVX2),
packet (SSE2),
hybrid
(SSE4.2),
...
Common Vector and SIMD Library
(Vec3f, Vec3fa, float4, float8, float16, SSE2, SSE4.1, AVX, AVX2, AVX512)
Intersection
MöllerTrumbore,
Plücker Variant,
Bezier Curve,
Triangle Grids
Subdiv
Engine
B-Spline Patch
Gregory Patch
TessellationCache
Displ. Mapping
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 High ray tracing performance for photorealistic rendering
 Large memory capacity to render really complex models
 Robust tools to develop and debug rendering application
 Complex shading and rendering applications are
executed efficiently (e.g. light cuts with large per pixel
state)
Why Ray Tracing on CPUs?
8/18/2015
15
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.8/18/2015
16
 Hides complexity of writing high performance ray tracing kernels
 gives you more time developing your renderer
 High performance on latest Intel® Xeon® Processor family and
Intel® Xeon Phi™ coprocessor products
 Embree always up to date with latest ISA instruction sets
 High potential performance gain
(1.5x – 6x rendering speedup reported by Embree users)
Why should I use Embree?
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 As a benchmark to identify performance issues in
existing applications
 Adopt algorithms from Embree to your code
– However Embree internals change frequently!
 As a library through the Embree API (recommended)
– Benefit from future Embree improvements!
How can I use Embree?
8/18/2015
17
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree v2.6.1 Performance
19
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Models and illumination effects representative for
professional rendering environment
 Path tracer with different material types, different
light types, about 2000 lines of code
 Evaluation on typical Intel® Xeon® rendering
workstation* and Intel® Xeon Phi™ Coprocessor**
 Compare against state of the art GPU*** methods
(using OptiX™ 3.8.0 and CUDA® 7.0.28)
 Identical implementations in ISPC (Xeon®), ISPC
(Xeon Phi™), OptiX™ (GTX™ Titan X)
Performance Methology
20
Imperial Crown of Austria
4.3M triangles
Bentley 4.5l Blower (1927)
2.3M triangles
Asian Dragon
12.3M triangles
* Dual Socket Intel® Xeon® E5-2699 v3 2x18 cores @ 2.30GHz ** Intel® Xeon Phi™ 7120, 61 cores @ 1.238 GHz *** NVIDIA® GeForce® GTX™ Titan X
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Build Performance for Static Scenes
40 41 45
32.3 31.7 35.1
0
50
100
150
Intel® Xeon® E5-2699 v3
Processor
2 x 18 cores, 2.3 GHz
Intel® Xeon Phi™ 7120
Coprocessor
61 cores, 1.28 GHz
21
MillionTriangles/Second
SAH Build (high quality)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Build Performance for Dynamic Scenes
112 108 105
160.1
140.4
162.1
0
50
100
150
Intel® Xeon® E5-2699 v3
Processor
2 x 18 cores, 2.3 GHz
Intel® Xeon Phi™ 7120
Coprocessor
61 cores, 1.28 GHz
22
MillionTriangles/Second
Morton Build
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Ray Tracing Performance (incl. Shading)
107.2
129.6 134.98
64.96
75.36
82.62
29.472 35.04 38.76
0
20
40
60
80
100
120
140
Intel® Xeon® E5-2699 v3
Processor
2 x 18 cores, 2.3 GHz
Intel® Xeon Phi™ 7120
Coprocessor
61 cores, 1.28 GHz
NVIDIA® GeForce®
GTX™ Titan X
Coprocessor
12 GB RAM
23
MillionRays/Second
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree API
8/18/2015
24
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Version 2 of the Embree API
 Compact and easy to use
 C++ and ISPC version
 Hides implementation details
(e.g. different spatial index structures)
Embree API Overview
8/18/2015
25
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Scene is container for set
of geometries
 Scene flags passed at
creation time
 Scene geometry changes
have to get commited
(rtcCommit) which
triggers BVH build
Scene Object
26
/* include embree headers */
#include <embree2/rtcore.h>
int main ()
{
/* initialize at application startup */
rtcInit ();
/* create scene */
RTCScene scene = rtcNewScene
(RTC_SCENE_STATIC,RTC_INTERSECT1);
/* add geometries */
... later slide ...
/* commit changes */
rtcCommit (scene);
/* trace rays */
... later slide ...
/* cleanup at application exit */
rtcExit ();
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Static Scenes
– Geometry cannot get changed
– High quality BVH build (SAH)  faster ray traversal
– For final frame rendering
 Dynamic Scenes
– Geometries can get added, modified, and removed
– Faster build (Morton)  slower ray traversal
– Preview mode during geometric modeling
Scene Types
27
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Triangle Mesh
 Contains vertex and
index buffers
 Number of triangles
and vertices set at
creation time
 Linear motion blur
supported (2 vertex
buffers)
/* add mesh to scene */
unsigned int geomID = rtcNewTriangleMesh
(scene, numTriangles, numVertices, 1);
/* fill data buffers */
... later slide ...
/* add more geometries */
...
/* commit changes */
rtcCommit (scene);
29
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Recommended to use buffer sharing
 Reduces memory consumption
 Application manages buffers (buffer has to stay alive as
long as geometry is alive)
 Support for stride and offset allows application
flexibility in its data layout
Buffer Sharing
30
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Buffer Sharing Example
/* application vertex and index layout */
struct Vertex { float x,y,z,s,t; };
struct Triangle { int materialID, v0, v1, v2; };
/* share buffers with application */
rtcSetBuffer(scene,geomID,RTC_VERTEX_BUFFER,vertexPtr,0,sizeof(Vertex));
rtcSetBuffer(scene,geomID,RTC_INDEX_BUFFER ,indexPtr ,4,sizeof(Triangle));
31
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Tracing Rays
 rtcIntersect (scene, ray) reports first intersection
 rtcOccluded (scene, ray) reports any intersection
 Packet versions for ray packets of size 4,8, and 16
32
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
rtcIntersect: Ray Structure Inputs
 Ray origin and direction (org, dir)
 Ray interval (tnear, tfar)
 Time used for motion blur [0,1]
struct RTCRay
{
Vec3f org;
Vec3f dir;
float tnear;
float tfar;
float time;
Vec3f Ng;
float u;
float v;
int geomID;
int primID;
int instID;
}
33
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
rtcIntersect: Ray structure Outputs
 Hit distance (tfar)
 Unnormalized geometry normal (Ng)
 Local hit coordinates (u,v)
 Geometry identifier of hit geometry
(geomID)
 Index of hit primitive of geometry
(primID)
 Geometry identifier of hit instance
(instID)
 No shading normals, texture
coordinates, etc.
struct RTCRay
{
Vec3f org;
Vec3f dir;
float tnear;
float tfar;
float time;
Vec3f Ng;
float u;
float v;
int geomID;
int primID;
int instID;
}
34
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
35
 Simplifies writing vectorized renderer
 C-based language plus vector extensions
 Scalar looking code that gets vectorized automatically
 Guaranteed vectorization
 Compilation to different vector ISAs (SSE, AVX, AVX2,
AVX512, Xeon Phi™)
 Available as Open Source from http://ispc.github.com
Intel® SPMD Program Compiler (ISPC)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
/* loop over all screen pixels */
foreach (y=0 ... screenHeight-1, x=0 ... screenWidth-1)
{
/* create and trace primary ray */
RTCRay ray = make_Ray(p,normalize(x*vx + y*vy + vz),eps,inf);
rtcIntersect(scene,ray);
/* environment shading */
if (ray.geomID == RTC_INVALID_GEOMETRY_ID) {
pixels[y*screenWidth+x] = make_Vec3f(0.0f); continue;
}
/* calculate hard shadows */
RTCRay shadow = make_Ray(ray.org+ray.tfar*ray.dir,neg(lightDir),eps,inf);
rtcOccluded(scene,shadow);
if (shadow.geomID == RTC_INVALID_GEOMETRY_ID)
pixels[y*width+x] = colors[ray.primID]*(0.5f + clamp(-dot(lightDir,normalize(ray.Ng)),0.0f,1.0f));
else
pixels[y*width+x] = colors[ray.primID]*0.5f;
}
Embree Rendering: ISPC Example
36
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Dynamic Scenes
 Create scene with
RTC_SCENE_DYNAMIC flag
 Report modified meshes
with rtcUpdate call
 Possibly
enable (rtcEnable),
disable (rtcDisable),
add (rtcNewXX), and
delete (rtcDeleteGeometry)
geometries
for each frame
{
for each dynamic mesh
{
/* modify shared buffers */
modify mesh->indices
modify mesh->vertices
/* signal mesh update */
rtcUpdate(scene,mesh);
}
/* commit changes */
rtcCommit (scene);
/* trace rays */
...
}
37
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Per geometry callback that is called
during traversal for each primitive
intersection
 Callback can accept or reject hit
 Can be used for:
– Trimming curves (e.g. modeling tree leaves)
– Transparent shadows (reject and
accumulate)
– Find all hits (reject and collect)
Intersection Filter Functions
42
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
/* procedural intersection filter function */
void intersectionFilter(void* userPtr, RTCRay& ray)
{
Vec3fa h = ray.org + ray.dir*ray.tfar;
float v = abs(sin(4.0f*h.x)*cos(4.0f*h.y)*sin(4.0f*h.z));
float T = clamp((v-0.1f)*3.0f,0.0f,1.0f);
if (T > 1.0f) return; // accept hit
ray.geomID = RTC_INVALID_GEOMETRY_ID; // reject hit
}
/* set intersection filter for the cube */
rtcSetIntersectionFilterFunction(scene, geomID, (RTCFilterFunc)&intersectionFilter);
rtcSetOcclusionFilterFunction (scene, geomID, (RTCFilterFunc)&intersectionFilter);
rtcSetUserData (scene, geomID, NULL);
Filter Function Example
43
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Hair curves represented as
cubic bezier curves with
varying radius
 High performance through
use of oriented bounding
boxes
 Low memory consumption
through direct ray/curve
intersection
Hair Geometry
44
p0/r0 p1/r1
p2/r2
p3/r3
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Catmull Clark Subdivision Surfaces
8/18/2015
45
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Converts coarse mesh into smooth
surface by subdivision
 Generalization of bi-cubic B-Spline
surfaces to arbitrary topology
 Embree is compatible with
OpenSubdiv 3.0
Catmull Clark Subdivision Surfaces
46 46
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Catmull Clark Subdivision
47
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Low resolution base mesh controls high
resolution surface
 Smoothness always guaranteed
(C2 continous almost everywhere)
 Support for arbitrary topology
(no trimming required as with NURBS)
 Creases allow introducing sharp features
 Support in most modeling tools
 Established as standard in movie production
CC Subdivision Surface Advantages
Inside-Out (2015)
Pixar, Walt Disney Pictures
48
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Semi-sharp edge creases
 Semi-sharp vertex creases
 Vertex attribute interpolation
 Tessellation level per edge
 Non-manifolds and Holes
 Boundary modes
 Triangles, Quads, Pentagons, ...
 Displacement mapping
Embree Subdivision Features
49
49
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree Subdivision Example
50
unsigned geomID = rtcNewSubdivisionMesh (scene, RTC_GEOMETRY_STATIC,
numFaces, numIndices, numVertices,
numEdgeCreases, numVertexCreases, numHoles);
rtcSetBuffer (scene,geomID,RTC_VERTEX_BUFFER, vertices, 0, sizeof(float3));
rtcSetBuffer (scene,geomID,RTC_INDEX_BUFFER , indices, 0, sizeof(int));
rtcSetBuffer (scene,geomID,RTC_FACE_BUFFER , faces, 0, sizeof(int));
rtcSetBuffer (scene,geomID,RTC_LEVEL_BUFFER , levels, 0, sizeof(float));
rtcSetBuffer (scene,geomID,RTC_EDGE_CREASE_INDEX_BUFFER,...);
rtcSetBuffer (scene,geomID,RTC_EDGE_CREASE_WEIGHT_BUFFER,...);
rtcSetBuffer (scene,geomID,RTC_VERTEX_CREASE_INDEX_BUFFER,...);
rtcSetBuffer (scene,geomID,RTC_VERTEX_CREASE_WEIGHT_BUFFER,...);
rtcSetBuffer (scene,geomID,RTC_HOLE_BUFFER,holes,0,sizeof(char));
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Feature adaptive subdivision to
evaluate patch (subdivides only at
irregular vertices and crease
features)
 Fast path for B-Spline patches and
Gregory patches
 Tessellation Cache limits memory
consumption (trade memory for
performance)
Embree Subdivision Implemention
51
Feature adaptive subdivision
into B-Spline patches (green) and
Gregory Patches (blue)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Embree Subdivision Performance
Patches 16 52k 53k
Edge Creases 0 0 30k
Micro Quads 1048k 831k 837k
Walkthrough 32 fps 36 fps 23 fps
Same View 66 fps 51 fps 56 fps
Intel® Xeon® E5-2690
2.9 GHz
2x 8 cores
1024 x 1024 pixels
52
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Interpolates arbitrary user data over
geometries (non-trivial for subdivision
geometries)
 Interpolated data P as well as dPdu and
dPdv can be calculated at arbitrary location
 Enables smooth normals and anisotropic
texture lookups
 Different rules for interpolation of texture
coordinates supported (by evaluation of
second subdiv mesh)
Vertex Data Interpolation
53
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Vertex Data Interpolation Example
rtcNewScene (RTC_STATIC, RTC_INTERSECT1 | RTC_INTERPOLATE);
...
unsigned geomID = rtcNewSubdivisionMesh (...);
rtcSetBuffer (scene,geomID,RTC_INDEX_BUFFER, indices, 0, sizeof(int));
rtcSetBuffer (scene,geomID,RTC_VERTEX_BUFFER, vertices, 0, sizeof(float3));
rtcSetBuffer (scene,geomID,RTC_USER_VERTEX_BUFFER, vertex_colors, 0, sizeof(float3));
...
rtcCommit (scene);
...
rtcIntersect (scene, ray);
...
float3 P, dPdu, dPdv;
rtcInterpolate (scene, geomID, primID, ray.u,ray.v, RTC_VERTEX_BUFFER, &P, &dPdu, &dPdv, 3);
float3 color;
rtcInterpolate (scene, geomID, primID, ray.u,ray.v, RTC_USER_VERTEX_BUFFER, &color, 0,0, 3);
54
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Displaced Subdivision Surface
55
 Support for vector displacement
 Callback function displaces vertex
positions
 Bounded displacement allows for lazy
evaluation
 Smooth normals possible through
approximation
Q = P + D*Ng
dQdu ≈ dPdu + dDdu*Ng
dQdv ≈ dPdv + dDdv*Ng
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
void displacementFunction(
void* ptr, int geomID, int primID,
const float* u, const float* v,
const float* nx, const float* ny, const float* nz,
float* px, float* py, float* pz,
size_t N)
{
for (size_t i = 0; i<N; i++) {
float D = displacement(...);
px[i] += D*nx[i];
py[i] += D*ny[i];
pz[i] += D*nz[i];
}
}
BBox3fa bounds(...);
rtcSetDisplacementFunction (scene,geomID,displacementFunction,&bounds);
Displaced Subdivision Surface Example
56
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
57
 Embree delivers highest ray tracing performance on CPUs
 Embree can speed up many ray tracing applications
 Embree is easy to use through its API
 Subdivision surface support compatible to OpenSubdiv 3.0
 Free and Open Source (https://embree.github.com)
Summary
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Questions?
https://embree.github.io
embree@googlegroups.com
8/18/2015
58
C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
60
Technical Overview
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Stochastic integration
of all potential light
paths between source
and pixel
 Follows light
backwards from pixel
to light source
 Produces incoherent
ray distributions
Monte Carlo Ray Tracing
Light
Pixel
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Two Kinds of Ray Distributions
Incoherent Rays
(typical for Monte Carlo)
Coherent Rays
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH Acceleration Structure
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH Acceleration Structure
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH Acceleration Structure
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH Acceleration Structure
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH Acceleration Structure
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Solution Space for Vectorized Ray Tracing
Single Ray
SIMD
Traversal
Scalar
Traversal
Packet
Traversal
Independent
Ray TraversalMulti Ray
Single Ray
Single Box Multi Box
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
BVH4 Spatial Index Structure
struct Node4 {
ssef minx, miny, minz;
ssef maxx, maxy, maxz;
Node4* child[4];
}
struct Triangle4 {
ssef v0x,v0y,v0z;
ssef e1x,e1y,e1z;
ssef e2x,e2y,e2z;
ssef Nx,Ny,Nz;
ssei id0,id1;
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
For each dimension:
 Intersect ray with near plane of each box in SIMD
 Intersect ray with far plane of each box in SIMD
 Clip the near and the far parameters (box hit if near <= far)
BVH4 Traversal
near4
near1
near2
near3
far1
far1
far3 far4
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
• Reduce number of executed instructions
• Reduce data dependencies of critical paths
• Take advantage of special instructions (e.g. SSE, bitscan,
etc.)
• Optimize most frequently executed code paths
Optimizing BVH4 Traversal for CPUs
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
• Load front/back plane based on
direction sign of the ray.
• Balanced min/max trees
• Bitscans to iterate through hit children
• Early exit for 0 children hit (20%)
• 1 child hit (50%): keep next node in
register (instead of push/pop
sequence)
• 2 children hit (20%): keep next node
in register, sort using a branch
Optimizing BVH4 Traversal for CPUs (SSE)
while (true) {
if (isLeaf(node)) goto leaf;
ssef nearX = (norg.x + node[nearX]) * rdir.x;
ssef nearY = (norg.y + node[nearY]) * rdir.y;
ssef nearZ = (norg.z + node[nearZ]) * rdir.z;
ssef farX = (norg.x + node[farX ]) * rdir.x;
ssef farY = (norg.y + node[farY ]) * rdir.y;
ssef farZ = (norg.z + node[farZ ]) * rdir.z;
ssef near = max(max(nearX,nearY),
max(nearZ,ray.near));
ssef far = min(min(farX,farY),
min(farZ,ray.far));
int hitmask = movemask(near <= far);
if (hitmask == 0) goto pop;
int c = bitscan(hitmask);
hitmask = clearbit(hitmask,c);
if (hitmask == 0) {
node = node.child[c]; continue;
} …

Embree Ray Tracing Kernels

  • 1.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree Ray Tracing Kernels Sven Woop
  • 2.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Optimized Intel® HD Graphics P3000 only available on select models of the Intel® Xeon® processor E3 family. To learn more about Intel Xeon processors for workstation visit www.intel.com/go/workstation. HD Graphics P4000 introduces four additional execution units, going from 8 in the HD P3000 to 12 in the HD P4000. Optimized Intel® HD Graphics P4000 only available on select models of the Intel® Xeon® processor E3-1200 v2 product family. For more information, visithttp://www.intel.com/content/www/us/en/architecture-and- technology/hdgraphics/hdgraphics-developer.html Iris™ graphics is available on select systems. Consult your system manufacturer. Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to : http://www.Intel.com/performance Legal 8/18/2015 2
  • 3.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Legal Disclaimer and Optimization Notice 3
  • 4.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Embree Overview  Embree Performance  Embree API  Catmull Clark Subdivision Surfaces Outline 8/18/2015 4
  • 5.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree Overview 8/18/2015 6
  • 6.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.8/18/2015 • Movie industry transitioning to ray tracing (better image quality, faster feedback) • High quality rendering for commercials, prints, etc. • Provides higher fidelity for virtual design (automotive industry, architectural design, etc.) • Various kind of simulations (lighting, sound, particles, collision detection, etc.) • Prebaked lighting in games • etc. Usage of Ray Tracing Today 7
  • 7.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Need to multi-thread: easy for rendering but difficult for hierarchy construction  Need to vectorize: efficient use of SIMD units, different ISAs (SSE, AVX, AVX2, AVX-512, KNCNI)  Need deep domain knowledge: many different data structures (kd-trees, octrees, grids, BVH2, BVH4, ..., hybrid structures) and algorithms (single rays, packets, large packets, stream tracing, ...) to choose  Need to support different CPUs: Different ISAs/CPU types favor different data structures, data layouts, and algorithms Writing a Fast Ray Tracer is Difficult 8/18/2015 8
  • 8.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Observations 8/18/2015  Ray tracers are often not sufficiently optimized  Ray traversal consumes a lot of cycles of renderer (often over 70%)  Ray tracing can be expressed by small number of commonly used operations (build and traversal)  Ray tracing kernel library has potential to speed up many rendering applications 9
  • 9.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Provides highly optimized and scalable Ray Tracing Kernels (data structure build and ray traversal)  Highest ray tracing performance (1.5x – 6x speedup reported by users)  Support for latest CPUs (e.g. AVX512 support)  Targets application developers in professional rendering environment  API for easy integration into applications  Free and Open Source under Apache 2.0 license (http://embree.github.com) Embree Ray Tracing Kernels 8/18/2015 10
  • 10.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Find closest and any hit kernel (rtcIntersect, rtcOccluded)  Single Rays and Ray Packets (4, 8, 16)  High quality and high performance hierarchy builders  Intel® SPMD Program Compiler (ISPC) supported  Triangles, Instances, Hair, Catmull Clark Subdivision Surfaces, Displacement Mapping  Extensible (User Defined Geometry, Intersection filter functions, Open Source)  SSE, AVX, AVX2, and AVX512 support Embree Features 8/18/2015 11
  • 11.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Catmull Clark Subdivision Surfaces – Smooth surface primitive  Vector Displacement Mapping – Add geometric detail  Initial AVX512 support – 16 wide AVX512 traversal kernels New Embree Features 8/18/2015 12
  • 12.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree System Overview 8/18/2015 13 Embree API (C++ and ISPC) Ray Tracing Kernel Selection Accel. structure bvh4.triangle4, bvh8.triangle8, bvh4aos.triangle1, bvh4.grid … Builders SAH builder Spatial split builder Morton code builder BVH Refitter Traversal Single ray (SSE2, AVX, AVX2), packet (SSE2), hybrid (SSE4.2), ... Common Vector and SIMD Library (Vec3f, Vec3fa, float4, float8, float16, SSE2, SSE4.1, AVX, AVX2, AVX512) Intersection MöllerTrumbore, Plücker Variant, Bezier Curve, Triangle Grids Subdiv Engine B-Spline Patch Gregory Patch TessellationCache Displ. Mapping
  • 13.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  High ray tracing performance for photorealistic rendering  Large memory capacity to render really complex models  Robust tools to develop and debug rendering application  Complex shading and rendering applications are executed efficiently (e.g. light cuts with large per pixel state) Why Ray Tracing on CPUs? 8/18/2015 15
  • 14.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.8/18/2015 16  Hides complexity of writing high performance ray tracing kernels  gives you more time developing your renderer  High performance on latest Intel® Xeon® Processor family and Intel® Xeon Phi™ coprocessor products  Embree always up to date with latest ISA instruction sets  High potential performance gain (1.5x – 6x rendering speedup reported by Embree users) Why should I use Embree?
  • 15.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  As a benchmark to identify performance issues in existing applications  Adopt algorithms from Embree to your code – However Embree internals change frequently!  As a library through the Embree API (recommended) – Benefit from future Embree improvements! How can I use Embree? 8/18/2015 17
  • 16.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree v2.6.1 Performance 19
  • 17.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Models and illumination effects representative for professional rendering environment  Path tracer with different material types, different light types, about 2000 lines of code  Evaluation on typical Intel® Xeon® rendering workstation* and Intel® Xeon Phi™ Coprocessor**  Compare against state of the art GPU*** methods (using OptiX™ 3.8.0 and CUDA® 7.0.28)  Identical implementations in ISPC (Xeon®), ISPC (Xeon Phi™), OptiX™ (GTX™ Titan X) Performance Methology 20 Imperial Crown of Austria 4.3M triangles Bentley 4.5l Blower (1927) 2.3M triangles Asian Dragon 12.3M triangles * Dual Socket Intel® Xeon® E5-2699 v3 2x18 cores @ 2.30GHz ** Intel® Xeon Phi™ 7120, 61 cores @ 1.238 GHz *** NVIDIA® GeForce® GTX™ Titan X
  • 18.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Build Performance for Static Scenes 40 41 45 32.3 31.7 35.1 0 50 100 150 Intel® Xeon® E5-2699 v3 Processor 2 x 18 cores, 2.3 GHz Intel® Xeon Phi™ 7120 Coprocessor 61 cores, 1.28 GHz 21 MillionTriangles/Second SAH Build (high quality)
  • 19.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Build Performance for Dynamic Scenes 112 108 105 160.1 140.4 162.1 0 50 100 150 Intel® Xeon® E5-2699 v3 Processor 2 x 18 cores, 2.3 GHz Intel® Xeon Phi™ 7120 Coprocessor 61 cores, 1.28 GHz 22 MillionTriangles/Second Morton Build
  • 20.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Ray Tracing Performance (incl. Shading) 107.2 129.6 134.98 64.96 75.36 82.62 29.472 35.04 38.76 0 20 40 60 80 100 120 140 Intel® Xeon® E5-2699 v3 Processor 2 x 18 cores, 2.3 GHz Intel® Xeon Phi™ 7120 Coprocessor 61 cores, 1.28 GHz NVIDIA® GeForce® GTX™ Titan X Coprocessor 12 GB RAM 23 MillionRays/Second
  • 21.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree API 8/18/2015 24
  • 22.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Version 2 of the Embree API  Compact and easy to use  C++ and ISPC version  Hides implementation details (e.g. different spatial index structures) Embree API Overview 8/18/2015 25
  • 23.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Scene is container for set of geometries  Scene flags passed at creation time  Scene geometry changes have to get commited (rtcCommit) which triggers BVH build Scene Object 26 /* include embree headers */ #include <embree2/rtcore.h> int main () { /* initialize at application startup */ rtcInit (); /* create scene */ RTCScene scene = rtcNewScene (RTC_SCENE_STATIC,RTC_INTERSECT1); /* add geometries */ ... later slide ... /* commit changes */ rtcCommit (scene); /* trace rays */ ... later slide ... /* cleanup at application exit */ rtcExit (); }
  • 24.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Static Scenes – Geometry cannot get changed – High quality BVH build (SAH)  faster ray traversal – For final frame rendering  Dynamic Scenes – Geometries can get added, modified, and removed – Faster build (Morton)  slower ray traversal – Preview mode during geometric modeling Scene Types 27
  • 25.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Triangle Mesh  Contains vertex and index buffers  Number of triangles and vertices set at creation time  Linear motion blur supported (2 vertex buffers) /* add mesh to scene */ unsigned int geomID = rtcNewTriangleMesh (scene, numTriangles, numVertices, 1); /* fill data buffers */ ... later slide ... /* add more geometries */ ... /* commit changes */ rtcCommit (scene); 29
  • 26.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Recommended to use buffer sharing  Reduces memory consumption  Application manages buffers (buffer has to stay alive as long as geometry is alive)  Support for stride and offset allows application flexibility in its data layout Buffer Sharing 30
  • 27.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Buffer Sharing Example /* application vertex and index layout */ struct Vertex { float x,y,z,s,t; }; struct Triangle { int materialID, v0, v1, v2; }; /* share buffers with application */ rtcSetBuffer(scene,geomID,RTC_VERTEX_BUFFER,vertexPtr,0,sizeof(Vertex)); rtcSetBuffer(scene,geomID,RTC_INDEX_BUFFER ,indexPtr ,4,sizeof(Triangle)); 31
  • 28.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Tracing Rays  rtcIntersect (scene, ray) reports first intersection  rtcOccluded (scene, ray) reports any intersection  Packet versions for ray packets of size 4,8, and 16 32
  • 29.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. rtcIntersect: Ray Structure Inputs  Ray origin and direction (org, dir)  Ray interval (tnear, tfar)  Time used for motion blur [0,1] struct RTCRay { Vec3f org; Vec3f dir; float tnear; float tfar; float time; Vec3f Ng; float u; float v; int geomID; int primID; int instID; } 33
  • 30.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. rtcIntersect: Ray structure Outputs  Hit distance (tfar)  Unnormalized geometry normal (Ng)  Local hit coordinates (u,v)  Geometry identifier of hit geometry (geomID)  Index of hit primitive of geometry (primID)  Geometry identifier of hit instance (instID)  No shading normals, texture coordinates, etc. struct RTCRay { Vec3f org; Vec3f dir; float tnear; float tfar; float time; Vec3f Ng; float u; float v; int geomID; int primID; int instID; } 34
  • 31.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 35  Simplifies writing vectorized renderer  C-based language plus vector extensions  Scalar looking code that gets vectorized automatically  Guaranteed vectorization  Compilation to different vector ISAs (SSE, AVX, AVX2, AVX512, Xeon Phi™)  Available as Open Source from http://ispc.github.com Intel® SPMD Program Compiler (ISPC)
  • 32.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. /* loop over all screen pixels */ foreach (y=0 ... screenHeight-1, x=0 ... screenWidth-1) { /* create and trace primary ray */ RTCRay ray = make_Ray(p,normalize(x*vx + y*vy + vz),eps,inf); rtcIntersect(scene,ray); /* environment shading */ if (ray.geomID == RTC_INVALID_GEOMETRY_ID) { pixels[y*screenWidth+x] = make_Vec3f(0.0f); continue; } /* calculate hard shadows */ RTCRay shadow = make_Ray(ray.org+ray.tfar*ray.dir,neg(lightDir),eps,inf); rtcOccluded(scene,shadow); if (shadow.geomID == RTC_INVALID_GEOMETRY_ID) pixels[y*width+x] = colors[ray.primID]*(0.5f + clamp(-dot(lightDir,normalize(ray.Ng)),0.0f,1.0f)); else pixels[y*width+x] = colors[ray.primID]*0.5f; } Embree Rendering: ISPC Example 36
  • 33.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Dynamic Scenes  Create scene with RTC_SCENE_DYNAMIC flag  Report modified meshes with rtcUpdate call  Possibly enable (rtcEnable), disable (rtcDisable), add (rtcNewXX), and delete (rtcDeleteGeometry) geometries for each frame { for each dynamic mesh { /* modify shared buffers */ modify mesh->indices modify mesh->vertices /* signal mesh update */ rtcUpdate(scene,mesh); } /* commit changes */ rtcCommit (scene); /* trace rays */ ... } 37
  • 34.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Per geometry callback that is called during traversal for each primitive intersection  Callback can accept or reject hit  Can be used for: – Trimming curves (e.g. modeling tree leaves) – Transparent shadows (reject and accumulate) – Find all hits (reject and collect) Intersection Filter Functions 42
  • 35.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. /* procedural intersection filter function */ void intersectionFilter(void* userPtr, RTCRay& ray) { Vec3fa h = ray.org + ray.dir*ray.tfar; float v = abs(sin(4.0f*h.x)*cos(4.0f*h.y)*sin(4.0f*h.z)); float T = clamp((v-0.1f)*3.0f,0.0f,1.0f); if (T > 1.0f) return; // accept hit ray.geomID = RTC_INVALID_GEOMETRY_ID; // reject hit } /* set intersection filter for the cube */ rtcSetIntersectionFilterFunction(scene, geomID, (RTCFilterFunc)&intersectionFilter); rtcSetOcclusionFilterFunction (scene, geomID, (RTCFilterFunc)&intersectionFilter); rtcSetUserData (scene, geomID, NULL); Filter Function Example 43
  • 36.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Hair curves represented as cubic bezier curves with varying radius  High performance through use of oriented bounding boxes  Low memory consumption through direct ray/curve intersection Hair Geometry 44 p0/r0 p1/r1 p2/r2 p3/r3
  • 37.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Catmull Clark Subdivision Surfaces 8/18/2015 45
  • 38.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Converts coarse mesh into smooth surface by subdivision  Generalization of bi-cubic B-Spline surfaces to arbitrary topology  Embree is compatible with OpenSubdiv 3.0 Catmull Clark Subdivision Surfaces 46 46
  • 39.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Catmull Clark Subdivision 47
  • 40.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Low resolution base mesh controls high resolution surface  Smoothness always guaranteed (C2 continous almost everywhere)  Support for arbitrary topology (no trimming required as with NURBS)  Creases allow introducing sharp features  Support in most modeling tools  Established as standard in movie production CC Subdivision Surface Advantages Inside-Out (2015) Pixar, Walt Disney Pictures 48
  • 41.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Semi-sharp edge creases  Semi-sharp vertex creases  Vertex attribute interpolation  Tessellation level per edge  Non-manifolds and Holes  Boundary modes  Triangles, Quads, Pentagons, ...  Displacement mapping Embree Subdivision Features 49 49
  • 42.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree Subdivision Example 50 unsigned geomID = rtcNewSubdivisionMesh (scene, RTC_GEOMETRY_STATIC, numFaces, numIndices, numVertices, numEdgeCreases, numVertexCreases, numHoles); rtcSetBuffer (scene,geomID,RTC_VERTEX_BUFFER, vertices, 0, sizeof(float3)); rtcSetBuffer (scene,geomID,RTC_INDEX_BUFFER , indices, 0, sizeof(int)); rtcSetBuffer (scene,geomID,RTC_FACE_BUFFER , faces, 0, sizeof(int)); rtcSetBuffer (scene,geomID,RTC_LEVEL_BUFFER , levels, 0, sizeof(float)); rtcSetBuffer (scene,geomID,RTC_EDGE_CREASE_INDEX_BUFFER,...); rtcSetBuffer (scene,geomID,RTC_EDGE_CREASE_WEIGHT_BUFFER,...); rtcSetBuffer (scene,geomID,RTC_VERTEX_CREASE_INDEX_BUFFER,...); rtcSetBuffer (scene,geomID,RTC_VERTEX_CREASE_WEIGHT_BUFFER,...); rtcSetBuffer (scene,geomID,RTC_HOLE_BUFFER,holes,0,sizeof(char));
  • 43.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Feature adaptive subdivision to evaluate patch (subdivides only at irregular vertices and crease features)  Fast path for B-Spline patches and Gregory patches  Tessellation Cache limits memory consumption (trade memory for performance) Embree Subdivision Implemention 51 Feature adaptive subdivision into B-Spline patches (green) and Gregory Patches (blue)
  • 44.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Embree Subdivision Performance Patches 16 52k 53k Edge Creases 0 0 30k Micro Quads 1048k 831k 837k Walkthrough 32 fps 36 fps 23 fps Same View 66 fps 51 fps 56 fps Intel® Xeon® E5-2690 2.9 GHz 2x 8 cores 1024 x 1024 pixels 52
  • 45.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Interpolates arbitrary user data over geometries (non-trivial for subdivision geometries)  Interpolated data P as well as dPdu and dPdv can be calculated at arbitrary location  Enables smooth normals and anisotropic texture lookups  Different rules for interpolation of texture coordinates supported (by evaluation of second subdiv mesh) Vertex Data Interpolation 53
  • 46.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vertex Data Interpolation Example rtcNewScene (RTC_STATIC, RTC_INTERSECT1 | RTC_INTERPOLATE); ... unsigned geomID = rtcNewSubdivisionMesh (...); rtcSetBuffer (scene,geomID,RTC_INDEX_BUFFER, indices, 0, sizeof(int)); rtcSetBuffer (scene,geomID,RTC_VERTEX_BUFFER, vertices, 0, sizeof(float3)); rtcSetBuffer (scene,geomID,RTC_USER_VERTEX_BUFFER, vertex_colors, 0, sizeof(float3)); ... rtcCommit (scene); ... rtcIntersect (scene, ray); ... float3 P, dPdu, dPdv; rtcInterpolate (scene, geomID, primID, ray.u,ray.v, RTC_VERTEX_BUFFER, &P, &dPdu, &dPdv, 3); float3 color; rtcInterpolate (scene, geomID, primID, ray.u,ray.v, RTC_USER_VERTEX_BUFFER, &color, 0,0, 3); 54
  • 47.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Displaced Subdivision Surface 55  Support for vector displacement  Callback function displaces vertex positions  Bounded displacement allows for lazy evaluation  Smooth normals possible through approximation Q = P + D*Ng dQdu ≈ dPdu + dDdu*Ng dQdv ≈ dPdv + dDdv*Ng
  • 48.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. void displacementFunction( void* ptr, int geomID, int primID, const float* u, const float* v, const float* nx, const float* ny, const float* nz, float* px, float* py, float* pz, size_t N) { for (size_t i = 0; i<N; i++) { float D = displacement(...); px[i] += D*nx[i]; py[i] += D*ny[i]; pz[i] += D*nz[i]; } } BBox3fa bounds(...); rtcSetDisplacementFunction (scene,geomID,displacementFunction,&bounds); Displaced Subdivision Surface Example 56
  • 49.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 57  Embree delivers highest ray tracing performance on CPUs  Embree can speed up many ray tracing applications  Embree is easy to use through its API  Subdivision surface support compatible to OpenSubdiv 3.0  Free and Open Source (https://embree.github.com) Summary
  • 50.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Questions? https://embree.github.io embree@googlegroups.com 8/18/2015 58
  • 51.
    C o py r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .
  • 52.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 60 Technical Overview
  • 53.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Stochastic integration of all potential light paths between source and pixel  Follows light backwards from pixel to light source  Produces incoherent ray distributions Monte Carlo Ray Tracing Light Pixel
  • 54.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Two Kinds of Ray Distributions Incoherent Rays (typical for Monte Carlo) Coherent Rays
  • 55.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH Acceleration Structure
  • 56.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH Acceleration Structure
  • 57.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH Acceleration Structure
  • 58.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH Acceleration Structure
  • 59.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH Acceleration Structure
  • 60.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Solution Space for Vectorized Ray Tracing Single Ray SIMD Traversal Scalar Traversal Packet Traversal Independent Ray TraversalMulti Ray Single Ray Single Box Multi Box
  • 61.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. BVH4 Spatial Index Structure struct Node4 { ssef minx, miny, minz; ssef maxx, maxy, maxz; Node4* child[4]; } struct Triangle4 { ssef v0x,v0y,v0z; ssef e1x,e1y,e1z; ssef e2x,e2y,e2z; ssef Nx,Ny,Nz; ssei id0,id1; }
  • 62.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. For each dimension:  Intersect ray with near plane of each box in SIMD  Intersect ray with far plane of each box in SIMD  Clip the near and the far parameters (box hit if near <= far) BVH4 Traversal near4 near1 near2 near3 far1 far1 far3 far4
  • 63.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. • Reduce number of executed instructions • Reduce data dependencies of critical paths • Take advantage of special instructions (e.g. SSE, bitscan, etc.) • Optimize most frequently executed code paths Optimizing BVH4 Traversal for CPUs
  • 64.
    Copyright © 2015,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. • Load front/back plane based on direction sign of the ray. • Balanced min/max trees • Bitscans to iterate through hit children • Early exit for 0 children hit (20%) • 1 child hit (50%): keep next node in register (instead of push/pop sequence) • 2 children hit (20%): keep next node in register, sort using a branch Optimizing BVH4 Traversal for CPUs (SSE) while (true) { if (isLeaf(node)) goto leaf; ssef nearX = (norg.x + node[nearX]) * rdir.x; ssef nearY = (norg.y + node[nearY]) * rdir.y; ssef nearZ = (norg.z + node[nearZ]) * rdir.z; ssef farX = (norg.x + node[farX ]) * rdir.x; ssef farY = (norg.y + node[farY ]) * rdir.y; ssef farZ = (norg.z + node[farZ ]) * rdir.z; ssef near = max(max(nearX,nearY), max(nearZ,ray.near)); ssef far = min(min(farX,farY), min(farZ,ray.far)); int hitmask = movemask(near <= far); if (hitmask == 0) goto pop; int c = bitscan(hitmask); hitmask = clearbit(hitmask,c); if (hitmask == 0) { node = node.child[c]; continue; } …