SlideShare a Scribd company logo
1 of 6
Download to read offline
Out-of-core rendering of massive data sets
Review Paper
Kurt Portelli
Department of Computer
Science
Faculty of Information &
Communications Technology
University of Malta
kpor0005@um.edu.mt
Mr. Sandro Spina
(Supervisor)
Department of Computer
Science
Faculty of Information &
Communications Technology
University of Malta
sandro.spina@um.edu.mt
Mr. Keith Bugeja
(Co-Supervisor)
Department of Computer
Science
Faculty of Information &
Communications Technology
University of Malta
keith.bugeja@um.edu.mt
ABSTRACT
With the advancement in computer-aided design technolo-
gies and 3D capturing devices, complex realistic geometric
models can be created that can contain millions of trian-
gles. Examples of such models include architectural build-
ings, complex CAD structures and 3D scans of heritage sites.
These 3D models pose a number of challenges to current
graphic systems when used for interactive display and ma-
nipulation. In this work we make use of out-of-core tech-
niques in order to enable rendering of massive data sets.
Our rendering framework makes use of ray-casting and an
acceleration data structure. The algorithm renders models
without the need to load them entirely into memory. Mod-
els are first added into the framework and organized onto a
tree-based acceleration data structure with the leaves con-
taining the models’ data. Depending on memory availabil-
ity, two caches are created inside the main memory, one for
the acceleration structure and another for the data stored
at the leaf nodes. Various replacement policy heuristics are
tested. Each heuristic has a different policy on how the data
is managed and replaced inside the caches to determine the
best approach in different scenes. Figure 7 found in sec-
tion 4 shows a series of rendered images by our system of
the power plant model downloaded from [13]. This model
contains about 12 million triangles and each 512*512 im-
age is rendered in about 21 seconds using the LRU cache
replacement policy.
1. INTRODUCTION
Designers and 3D artists often are able to create large and
complex 3D environments using the tools available nowa-
days. It is not uncommon to generate gigabyte-sized data
sets such as power plants, ships and air planes. These mod-
els are then used for simulations and design reviews[2]. Dis-
playing such models has been a major challenge especially
when this needs to be done in an interactive environment.
(a) A highly detailed model taken
from [2].
(b) A low detailed model taken
from [9].
Figure 1
In order to visualise these 3D models, data representing ge-
ometry and material information is first loaded inside the
main memory. However if the size of the model is greater
than the size of the memory, out-of-core algorithms must be
used in order to display the model. Out-of-core algorithms
are algorithms designed to process data that is larger than
the memory available. Some of the algorithms which help
alleviate the problem and accelerate the process to generate
each frame are model simplification [15], visibility culling [8],
level of detail [5], image or sample approximations [4] and
occluder databases [6].
To be able to keep up with the increasing complexity of the
3D models, more advanced algorithms and better hardware
acceleration are being developed continuously. Every little
performance boost gained is spent into rendering more geo-
metric surface detail (more model primitives). One can eas-
ily see the difference between the model in figure 1a and 1b.
Figure 1a has much more detail due to its higher primitive
count but as a downside consumes more memory resources
and takes more time to render. This is due to the fact that
the model in figure 1a has millions of triangle primitives
while 1b has only a few hundred. In general, since many
operations are applied on the model’s set of primitives (eg.
lighting, movement), an increase in the number of primitives
implies an increase in the computational time required.
The rasterization method is the most popular method for in-
teractive rendering, mainly using OpenGL and DirectX. It
takes the model’s 3D geometric description and transforms
it into 2 dimensional points [12] inside the view frame, and
has been implemented very efficiently inside hardware. This
hardware is called the graphics card (or Graphics Process-
ing Unit) and with time their performance has increased al-
lowing us to display even larger and more complex models.
The main challenge in this method is to determine visibility,
which means to identify the regions of surfaces in a scene
that are visible from the virtual camera. This requires a
lot of computational power and data parallelism is used to
attain a high frame rate. The increasing demand for ren-
dering realistic images means that such systems will need to
handle very complex geometry. Instead, approximations are
sometimes used to simplify the complexity of such geometry
while still rendering realistic images. These approximations
however still lack in detail when compared to high detail
geometry.E.Luong [7] argues that bumpy or displaced sur-
faces cannot be reproduced faithfully using approximations
especially as the bar for image quality rises.
To render bumpy surfaces one can either render a com-
plex geometric surface or instead use a bump mapping tech-
nique[12]. Bump mapping is an example of such approxima-
tions since it would contain a flat surface but then during
lighting calculations its surface normals would be modified
to fake depth.
When there’s a need to render highly detailed models, these
approximations might not generate the expected outcome.
Thus, in order to model highly complex and detailed objects,
a huge number of geometric primitives (e.g. triangles) are
used to define the model’s surface geometry. A model that
contains millions of geometric primitives will take up a lot
of memory and will not fit inside memory and would slow
the whole process of rasterization down. A straight forward
solution would be to increase the memory of the graphic
cards but there are other ways to tackle this problem and
these are discussed in this paper.
The aim of this work is to develop an out-of-core rendering
framework capable of loading massive models with limited
memory available. This aim is further split into these mile-
stones:
• Create a rendering system based on ray casting. This
ray casting system uses an acceleration data structure
to minimize the ray intersection search space, thus ren-
dering at a higher frame rate.
• The rendering system is extended to hold massive data
sets that are larger than the available memory so as to
use out-of-core rendering techniques.
• Design and evaluate the different cache replacement
policies within the system by simulating different sce-
narios and putting constraints on resources.
2. BACKGROUND
We now describe techniques and algorithms which are useful
in order for the reader to better understand the rest of the
paper. The approach taken was to render the scene using
ray casting and an acceleration structure to speed up ray
intersection tests. The framework created is able to load
only those parts of the acceleration structure that are being
used inside the main memory. This enables the framework
to render models that are greater in size than that of the
main memory.
The aim of a rendering method is to generate an image of
a three dimensional scene and display it on screen, clearly
Figure 2: Virtual Camera Definition
Figure 3: Depiction of a ray cast taken from [3].
we first need to define a virtual camera. The virtual cam-
era will be the user’s eyes inside this 3D space. A camera
as shown in figure 2 is created as a point in 3D space with
a direction. Apart from that, the vectors indicating which
way is up and right are also needed which help us calcu-
late the view plane. The camera direction vector signifies
where the camera is pointing. The camera position and
the other 3 vectors make up the whole camera. When the
scene geometry is not changing as in our case, animation of
a scene occurs by updating these camera vectors.Two func-
tions were added, one which lets the camera move forward
and the other which rotates the camera around the origin.
While rendering, a frame buffer is used to hold the contents
of all the pixels inside the particular frame. When all the
pixels are rendered, the system is ready to display the image
and then replace the contents of the frame buffer with the
contents of the next frame. This cycle continues until all the
required frames are rendered.
2.1 Rendering method
The use of ray casting for rendering 3d scenes was intro-
duced in 1968 by Arthur Appel[1]. Rays are generated from
the virtual camera and shot through each pixel of the view
plane. The closest object is determined and the color of
the ray is calculated through the object’s properties. Appel
also hinted the use of shadow rays to determine shadows
by shooting rays to the light sources. Ultimately this tells
whether the light source is occluded or not and be able to
influence the shading of the primary rays.
Ray casting simply returns the triangle hit from the primary
ray. More advanced designs would include taking into con-
Figure 4: Ray - Triangle Intersection
Figure 5: Shooting rays towards the pixels inside the image
plane
sideration light sources in the scene. A light source emits
a ray of light as depicted in figure 3 which can be seen as
a stream of photons travelling along a straight line (in a
vacuum) and when it hits a surface these 3 things happen;
absorption, reflection, refraction.
In ray casting the process stops when an object is hit as
shown in figure 4 while in ray tracing, all the reflected, and
refracted rays are followed. Thus, ray tracing is much more
computationally intensive[11] due to the fact that there are
more rays to calculate and keep track of.
The number of rays shot into the 3D scene is determined by
the width and height of the image we want to generate. Mul-
tiplied together would give us the total number of pixels the
image will have, or rather the total number of rays we need
to shoot towards the scene to generate a complete frame. A
1920 * 1080 (width * height) resolution would result in a
total of about 2 million rays. For each pixel in the gener-
ated image a ray is created. The parameters of the ray are
computed by subtracting the camera position and 3d pixel
location to get the ray direction as shown in figure 5. For
each ray the last step is to check for its closest intersection,
if any, with the 3D model. When all the rays are processed,
color can be given to the image by checking the distance
between the camera position and the hit location for each
pixel, effectively computing a depth map.
Models measuring in GBs will contain millions of primitives
so checking for the closest intersection with all the model’s
primitives will be too slow, thus an acceleration structure
is used. The acceleration structure will allow us to search
inside the model in logarithmic time, thus rendering each
frame faster.
2.2 Acceleration Data structure
An acceleration data structure is a data container that lets
us retrieve the data we actually need on average in loga-
rithmic time. During traversal each ray needs to return the
closest intersection, so the speed of each ray intersection will
determine the total time to render a frame. Storing all the
model’s triangle primitives as a list will not be efficient and
each frame will take a very long time to render since large
models contain millions of triangles. The aim of an accelera-
tion data structure is to accelerate this process by returning
a small subset of the object’s triangles. The triangles re-
turned should only be the ones that are close to the ray to
reduce the total number of intersection checks. This will
speed up the process of calculating each pixel and overall
generate a frame in much less time.
2.3 Memory management
An out-of-core rendering system requires some sort of mem-
ory management as data is dynamically moved from sec-
ondary storage to main memory. New data is also calculated
during runtime and needs to be stored while in the mean
time limit the memory usage to not run out of memory.
Such a system needs to be able to handle large quantities of
data that do not fit inside memory.
A memory limit is enforced by the memory management sys-
tem and then data is allocated accordingly, which include;
triangles, vertices, acceleration data structure, camera prop-
erties and frame buffers. For large models containing mil-
lions of triangles which have been tested in this dissertation
one cannot load everything inside the memory. The mem-
ory management system allows for the dynamic allocation
of data so that unused data can be swapped with the data
currently needed from secondary storage. Page faults oc-
cur when the system queries for data that is currently not
in memory so the system needs to be able to load data on
demand during runtime as they happen. Memory mapped
files are used to load and save data inside the secondary stor-
age and are considered as virtual memory by the out-of-core
rendering framework.
Each page fault decreases performance since retrieving data
from secondary storage is very time consuming. Thus the
system tries to minimize the number of page faults by us-
ing efficient cache replacement policies. The memory is
considered as a cache for all the model’s data and only
the data that is being used and considered important by
the replacement policies is kept inside memory. When the
cache(memory allocated) is empty, the data needed is stored
immediately. When memory is full and new data needs to
be saved, depending on replacement policies some of the old
data would be deleted. Part of the evaluation of this disser-
tation is to evaluate multiple cache policies and determine
the best one for a particular scene.
2.4 Parallelism
The out-of-core rendering framework can also utilize multi
threading in different areas of the system to gain in perfor-
mance. Ray casting as explained before is the process of
shooting a ray for each pixel. Thus each pixel can be ren-
dered independently in a different thread. The rendering
process and memory management can also be split into dif-
ferent threads so that memory management does not hold
back the rendering process when a page fault occurs. This
will be explained in detail in section 3.2.1.
Although splitting the process in different threads has the
possibility of increasing performance, this must be done with
care since each thread would want to use shared memory.
Threads can change the content of some of the shared data
and different versions of this data may exist creating in-
consistent and bad results. Race conditions occur when
thread ordering during execution would affect the outcome.
To solve this problem thread synchronization is used which
means that each thread would wait for other threads to fin-
ish using a shared resource. This will ensure that no thread
would access shared data that is currently in use and as
such generate correct results. When thread synchronization
is implemented this can slow the process down since threads
now must wait for each other during execution of the critical
sections. Thus, the critical section (where shared resources
are used) which is the sequential component must be concise
and fast so that the other threads don’t spend a lot of time
waiting.
3. DESIGN AND IMPLEMENTATION
3.1 kd-tree
The aim of storing the primitives inside a structure is to be
able to return only the closest primitives to the ray. This
would implicitly reduce the number of ray - primitive inter-
sections and speed up the whole process. It is important
to note that although such a structure speeds up the ren-
dering process it creates overhead in memory consumption.
Thus each node should be as small as possible to minimize
this overhead in space. The nodes must be stored in such a
way that lets the system access them quickly and utilise the
operating system’s cache mechanisms to the full. The accel-
eration structure used was the kd-tree since in practice they
have been shown to perform the best most of the time due to
their simplicity and capability to adapt to scene complexity
[16].
3.1.1 Construction
The kd-tree construction algorithm was implemented in a
recursive way. Each time the algorithm is called it chooses a
split plane and splits the triangles inside 2 inner nodes until
a termination condition is met and a leaf node returned.
The choice of the split plane determines the efficiency and
size of the kd-tree. Construction takes a very long time, but
this is only done once at the beginning and then used for
each frame.
3.1.2 Traversal
The traversal algorithm is called once for every pixel so it
needs to be very efficient and optimized. Kd-trees offer very
simple traversal code which lets us optimize it even more.
Ingo Wald’s traversal code is in fact very optimized and he
describes it in pseudo code form in Realtime Ray Tracing
and Interactive Global Illumination[16]. Due to the fact that
his implementation is already tested and confirmed that it
is one of the best implementations as seen by the results he
published we decided to base most of our traversal algorithm
on his work. When shooting a ray down a kd-tree we only
need to make binary choices; right or left node.
Figure 6: 3 different traversal scenarios
Traversing left or right is determined depending on how the
ray hits the split plane as shown in figure 6 by the 3 differ-
ent scenarios. This method also lets us using culling tech-
niques as we are able to determine which section of the split
plane is hit first, thus skipping part of the tree. This im-
plementation is very efficient since all computation is done
using ray segments, which are all 1-Dimensional computa-
tions[16]. The actual 3D coordinates of the entry, exit and
intersection points are not even computed and this greatly
improves performance.
3.1.3 Triangle Intersection
This code is ran thousands of times per second so it needs to
be very optimized and efficient. The algorithm used is the
one designed by Moller and Trumbore in [14]. A ray R(t) is
defined as: R(t) = O +tD . O being the origin and D the
normalized direction vector. On the other hand a triangle is
defined by 3 Vertices V0 , V1, v2. Our algorithm needs to
find t where the ray intersects with the triangle’s plane and
then check if the intersection is within the 3 vertices.
Using barycentric coordinates[10] is the fastest known way
to determine whether an intersection is within the triangle
and what the exact coordinates are. In our implementa-
tion the only thing we needed was the distance from to the
intersection to the camera position so the color could be
calculated but the exact 3D location of the intersection was
not needed. The idea behind barycentric coordinates can be
seen as placing weights on top of the Triangle’s vertices and
then these weights would determine a point inside or outside
the triangle.
3.2 System Cache
A fixed size for the amount of nodes and the Leaf data stored
inside the memory had to be set so that the system can be
ran on different operating system configurations. By leaf
data we mean the list of triangles contained inside a leaf
node. Taking all the leaf nodes’ lists and storing them as
a whole list takes up a lot of space even Gigabytes in size.
Thus a limit was set on the amount of space that will be
used to store the nodes and the list of triangles.Two separate
caches were created each with their own cache replacement
policies which can also be set to different sizes. Two other
very important separate arrays are contained in memory the
Triangles and Vertices of the 3D Model currently loaded.
These two arrays are constantly being accessed and need to
offer the best possible access time, thus they are stored as
an array and always kept inside the memory.
3.2.1 Loading
Frame Decaying Frequency FIFO Frequency LRU
1 3.4 3.8 3 3.8
3 7.1 10.9 6.4 11.1
6 6.7 11.1 6.1 11.4
9 6.1 10.5 5.5 10.7
12 5.7 10.2 5.2 10.2
15 5.2 9.5 4.7 9.6
Table 1: Ray Casts per second (in thousands) for 256MB
cache
Frame Decaying Frequency FIFO Frequency LRU
1 4.4 4.6 4.4 4.6
3 10.9 11.1 11.2 11.5
6 11.1 12 11.1 12
9 10.5 11.6 10.5 11.5
12 10.1 11.2 10.1 11.3
15 9.6 11 9.4 11.1
Table 2: Ray Casts per second (in thousands) for 768MB
cache
Both caches work in a very similar way since the methods
exposed are very similar. These include checking if a node
is inside the cache and add a node to the cache. These
methods hide all the complications that occur when a node
is not inside the cache and needs to be retrieved from sec-
ondary storage. The system offers two loading methods,
synchronously and asynchronously. When using the syn-
chronous method, the system waits for the cache to load the
node when a cache miss occurs. The asynchronous method
queues the current pixel and continues rendering other pixels
until the data needed is loaded from secondary storage
3.2.2 Configuration
The cache was designed in such a way to be as extensi-
ble as possible so that multiple cache replacement policies
are implemented and then tested.The different cache imple-
mentations which were implemented are First in First Out
(FIFO), Last recently used (LRU), Decaying Frequency and
Frequency. All the caches implemented use set associativity
as their mapping technique since it gives enough flexibility
to implement replacement policies.
4. EVALUATION AND RESULTS
To evaluate the system, multiple tests were carried out us-
ing different models and cache configurations whilst chang-
ing the amount of memory the system can use. The same
camera path is used throughout all these tests for a consis-
tent evaluation. A 512 * 512 frame is rendered for each two
degrees. The path is set to create 15 frames thus rotating a
total angle of 30 degrees. Two sets of results are generated;
the first set uses only one thread for rendering and load-
ing missing data while the second set uses 2 threads, one
for loading and the other for rendering the scene. The aim
of these tests is to determine the best technique and cache
replacement policy for this particular camera path.
Table 1 and 2 show the rays per second the system computes
(in thousands). The fastest frame was rendered at around
12k rays per second, which means that the frame was gener-
ated in about 21 seconds. Loading the whole model with the
Frame LRU Asynch LRU Synch
1 4.3 4.6
3 11.3 11.5
6 11.6 12
9 11.3 11.5
12 11.1 11.3
15 10.8 11.1
Table 3: Ray Casts per second (in thousands) for 768MB
cache
acceleration data structure would take around 4.6 GB. The
tests carried out loaded all the triangles and vertices and
then used the additional two caches, and in each test the
cache size varied. In the first test the limit was set at 256 MB
and in the other test set at 768 MB. Thus, in the first test
we were only using (((256MB*2)+200MB)/4410MB) *100)
= 16% of the memory needed originally if no out-of-core
techniques were used. Note that the extra 200 MB we are
adding to the space consumed by the caches, is the space
consumed by the triangles and vertices loaded into memory.
In the second test scenario we were loading 39% of the total
data into memory so better performance was obtained as
can be seen in table 2. Less cache misses occurred as the
size of the cache was increased since more data was kept in
memory.
The first thing one notices when seeing tables 1 and 2 is
the performance difference between the first frame and the
rest. This is because during the first frame data is still being
loaded into memory since the leaf data cache is empty. The
cache replacement policies that gave the best results were
FIFO and LRU. An interesting observation is that when the
cache size was very small compared to the 3D model, Fre-
quency and Decaying Frequency replacement policies cre-
ated many cache misses and in fact were the worst perform-
ers. As the cache size increased they had a major perfor-
mance boost and caught up with the rest of the replacement
policies.
Table 3 shows the rays per second (in thousands) when
running the system for the power plant model using the
LRU cache both when loading data synchronously and asyn-
chronously. The size is also changed and one can easily see
the performance boost when the size is increased. Analysing
the tables one can notice that loading the data asynchronously
there is no performance boost in most cases even though the-
oretically it should improve since the CPU would be spend-
ing less idle time. The major reason no improvement is seen
is because of the additional complexity the system has to be
able to load the data asynchronously and still generate cor-
rect results. With the current implementation when loading
asynchronously more cache misses are occurring since data
is being replaced without having been used and then queued
again for loading.
5. CONCLUSION
We have presented an out-of-core rendering system which
renders massive data sets and lets the user navigate through
the 3D space using a virtual camera. The system uses ray
casting as a rendering method and an acceleration data
structure to increase the frame rate. The acceleration data
Figure 7: Power Plant images
structure used is the kd-tree. Depending on the amount
of memory available, our rendering framework loads, on de-
mand, different parts of the structure and model’s data. The
system’s main memory is used as a temporary cache where
the data required to render a frame is loaded. A number
of cache replacement policies have been implemented, which
try to minimise the amount of cache swaps at runtime be-
tween successive frames. The framework also has the option
of spawning an extra thread which does the loading from
secondary storage separately with the aim of improving ren-
dering speeds.
From the results obtained, the cache replacement policy that
generated the least amount of cache misses and rendered the
models in the least amount of time is the Least Recently
Used policy(LRU). The FIFO replacement policy generated
very positive results too and achieved nearly the same per-
formance as LRU. Increasing the cache size will also improve
performance as less cache misses occur.
5.1 Future Work
As can be seen from the results the frame rate achieved
is not interactive so future improvement should focus on
increasing the rendering performance. Many factors affect
the rendering performance such as cache size, cache policy,
cache implementation, number of kd-tree nodes, number of
triangles inside leaf nodes, code optimization, traversal, and
intersection test method. The bottle neck of the system is
currently the kd-tree since the system generates large kd-
trees with some of the leaf nodes containing a large amount
of triangles. Optimally the leaf nodes contain few triangles
so that less ray-triangle intersections are performed during
runtime, thus a large amount of triangles slow the system
down. The biggest potential would be to improve the choice
of the splitting plane and termination criteria when building
the kd-tree. There was no improvement when loading data
asynchronously as seen in previous results, so better thread
synchronization techniques to minimize threads waiting for
each other would definitely improve the performance. Wald
mentions the use of ray packets as a more efficient traver-
sal mechanism to improve performance. Currently one ray
is traversed at a time but Wald’s method in [16] traverses
groups of rays at a time. The main idea behind this is that
rays close to each other will most likely have very similar
traversal paths, thus most of the data needed will already
be inside the CPU’s cache and that would result in a major
performance boost.
6. REFERENCES
[1] Arthur Appel. Some techniques for shading machine
renderings of solids. In Proceedings of the April
30–May 2, 1968, spring joint computer conference,
AFIPS ’68 (Spring), pages 37–45, New York, NY,
USA, 1968. ACM.
[2] William V. Baxter, III, Avneesh Sud, Naga K.
Govindaraju, and Dinesh Manocha. Gigawalk:
interactive walkthrough of complex environments. In
Proceedings of the 13th Eurographics workshop on
Rendering, EGRW ’02, pages 203–214, Aire-la-Ville,
Switzerland, Switzerland, 2002. Eurographics
Association.
[3] Jamis Buck. The recursive ray tracing algorithm.
http://reocities.com/SiliconValley/haven/5114/
raytracing.html. [Online] accessed 10-April-2013.
[4] Robert L. Cook. Stochastic sampling in computer
graphics. ACM Trans. Graph., 5(1):51–72, January
1986.
[5] Jonathan D. Cohen Amitabh Varshney Benjamin
Watson Robert Huebner David Luebke,
Martin Reddy. Level of Detail for 3D Graphics.
Morgan Kaufmann Publishers, 2003.
[6] Tomas Moller Eric Haines. Occlusion culling
algorithms.
http://www.gamasutra.com/view/feature/131801/
occlusion_culling_algorithms.php?page=3, 2009.
[Online] accessed 24-May-2013.
[7] Kayvon Fatahalian, Edward Luong, Solomon Boulos,
Kurt Akeley, William R. Mark, and Pat Hanrahan.
Data-parallel rasterization of micropolygons with
defocus and motion blur. In Proceedings of the
Conference on High Performance Graphics 2009, HPG
’09, pages 59–68, New York, NY, USA, 2009. ACM.
[8] Markus Hadwiger and Andreas Varga. Visibility
culling, 1999.
[9] HellSkyˆaˇD´c. 3d model of low poly fighter aircraft.
http://hellsky.com/index.php/1316/
3d-model-of-low-poly-fighter-aircraft/. [Online]
accessed 24-May-2013.
[10] Lighthouse3D. Ray-triangle intersection.
http://www.lighthouse3d.com/tutorials/maths/
ray-triangle-intersection/. [Online] accessed
1-May-2013.
[11] Jonathan Macey. Ray-tracing and other rendering
approaches. http://nccastaff.bournemouth.ac.uk/
jmacey/CGF/slides/RayTracing4up.pdf. [Online]
accessed 30-April-2013.
[12] Greg Humphreys Matt Phar. Physically Based
Rendering From Theory to Implementation Second
Edition. Morgan Kaufmann Publishers, 2010.
[13] Prof. Morgan McGuire. Computational graphics.
http://graphics.cs.williams.edu/, 2009. [Online]
accessed 1-May-2013.
[14] Tomas M¨oller and Ben Trumbore. Fast, minimum
storage ray-triangle intersection. J. Graph. Tools,
2(1):21–28, October 1997.
[15] Jarek Rossignac. Geometric simplification and
compression. 1997.
[16] Ingo Wald. Realtime Ray Tracing and Interactive
Global Illumination. PhD thesis, Computer Graphics
Group, Saarland University, 2004.

More Related Content

What's hot

Novel-design-Panoramic-camera-by dr MONIKA
Novel-design-Panoramic-camera-by dr MONIKANovel-design-Panoramic-camera-by dr MONIKA
Novel-design-Panoramic-camera-by dr MONIKADevarshi Bajpai
 
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...IJEACS
 
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...Vladimir Kanchev
 
Pengantar Structure from Motion Photogrammetry
Pengantar Structure from Motion PhotogrammetryPengantar Structure from Motion Photogrammetry
Pengantar Structure from Motion PhotogrammetryDany Laksono
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites imagesYoussefKitane
 
Basics of image processing & analysis
Basics of image processing & analysisBasics of image processing & analysis
Basics of image processing & analysisMohsin Siddique
 
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...IOSR Journals
 
Stereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataStereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataTELKOMNIKA JOURNAL
 
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Habibur Rahman
 
02_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_201402_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_2014Atiqa khan
 
Three View Self Calibration and 3D Reconstruction
Three View Self Calibration and 3D ReconstructionThree View Self Calibration and 3D Reconstruction
Three View Self Calibration and 3D ReconstructionPeter Abeles
 
Detecting image splicing in the wild Web
Detecting image splicing in the wild WebDetecting image splicing in the wild Web
Detecting image splicing in the wild WebSymeon Papadopoulos
 
Advanced animation techniques
Advanced animation techniquesAdvanced animation techniques
Advanced animation techniquesCharles Flynt
 
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...ijma
 
Path planning using vrep
Path planning using vrepPath planning using vrep
Path planning using vrepeSAT Journals
 

What's hot (20)

Novel-design-Panoramic-camera-by dr MONIKA
Novel-design-Panoramic-camera-by dr MONIKANovel-design-Panoramic-camera-by dr MONIKA
Novel-design-Panoramic-camera-by dr MONIKA
 
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...
 
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
 
Pengantar Structure from Motion Photogrammetry
Pengantar Structure from Motion PhotogrammetryPengantar Structure from Motion Photogrammetry
Pengantar Structure from Motion Photogrammetry
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
 
Visual odometry _report
Visual odometry _reportVisual odometry _report
Visual odometry _report
 
paper
paperpaper
paper
 
Basics of image processing & analysis
Basics of image processing & analysisBasics of image processing & analysis
Basics of image processing & analysis
 
iwvp11-vivet
iwvp11-vivetiwvp11-vivet
iwvp11-vivet
 
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...
Enhanced Algorithm for Obstacle Detection and Avoidance Using a Hybrid of Pla...
 
Stereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud dataStereo vision-based obstacle avoidance module on 3D point cloud data
Stereo vision-based obstacle avoidance module on 3D point cloud data
 
D018112429
D018112429D018112429
D018112429
 
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
 
06466595
0646659506466595
06466595
 
02_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_201402_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_2014
 
Three View Self Calibration and 3D Reconstruction
Three View Self Calibration and 3D ReconstructionThree View Self Calibration and 3D Reconstruction
Three View Self Calibration and 3D Reconstruction
 
Detecting image splicing in the wild Web
Detecting image splicing in the wild WebDetecting image splicing in the wild Web
Detecting image splicing in the wild Web
 
Advanced animation techniques
Advanced animation techniquesAdvanced animation techniques
Advanced animation techniques
 
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
 
Path planning using vrep
Path planning using vrepPath planning using vrep
Path planning using vrep
 

Viewers also liked

Our Work
Our WorkOur Work
Our Worknetjib
 
Iceland Capstone Project
Iceland Capstone ProjectIceland Capstone Project
Iceland Capstone ProjectDaniel Liu
 
Blogg%20%282%29
Blogg%20%282%29Blogg%20%282%29
Blogg%20%282%29viriiis
 
Halim Hani - HH YPFC0001-YouthParent Future Community
Halim Hani - HH YPFC0001-YouthParent Future Community Halim Hani - HH YPFC0001-YouthParent Future Community
Halim Hani - HH YPFC0001-YouthParent Future Community Halim Hani
 
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...Fiinovation
 
How to Align Sales & Marketing - CEB October 2016
How to Align Sales & Marketing - CEB October 2016 How to Align Sales & Marketing - CEB October 2016
How to Align Sales & Marketing - CEB October 2016 InsideView
 
Ingreso y utilización de los alimentos en el sistema digestivo.
 Ingreso y utilización de los alimentos en el sistema digestivo. Ingreso y utilización de los alimentos en el sistema digestivo.
Ingreso y utilización de los alimentos en el sistema digestivo.estela villegas
 
Abat Combi-Steamers - Russian Brochure
Abat Combi-Steamers - Russian BrochureAbat Combi-Steamers - Russian Brochure
Abat Combi-Steamers - Russian Brochureabatprofessional
 
3.пирожки жареные с творогом
3.пирожки жареные с творогом3.пирожки жареные с творогом
3.пирожки жареные с творогомSokirianskiy&Lazerson School
 

Viewers also liked (19)

Our Work
Our WorkOur Work
Our Work
 
Iceland Capstone Project
Iceland Capstone ProjectIceland Capstone Project
Iceland Capstone Project
 
Blogg%20%282%29
Blogg%20%282%29Blogg%20%282%29
Blogg%20%282%29
 
Halim Hani - HH YPFC0001-YouthParent Future Community
Halim Hani - HH YPFC0001-YouthParent Future Community Halim Hani - HH YPFC0001-YouthParent Future Community
Halim Hani - HH YPFC0001-YouthParent Future Community
 
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...
Fiinovation ties up with Deen Dayal Upadhyay Hospital for organizing blood do...
 
How to Align Sales & Marketing - CEB October 2016
How to Align Sales & Marketing - CEB October 2016 How to Align Sales & Marketing - CEB October 2016
How to Align Sales & Marketing - CEB October 2016
 
3. Cristina Rumbaitis Del Rio - Understanding Research Impact
3. Cristina Rumbaitis Del Rio - Understanding Research Impact3. Cristina Rumbaitis Del Rio - Understanding Research Impact
3. Cristina Rumbaitis Del Rio - Understanding Research Impact
 
Life project
Life projectLife project
Life project
 
Tema5 1 Idade Madia
Tema5 1 Idade MadiaTema5 1 Idade Madia
Tema5 1 Idade Madia
 
Wara wiri wirausaha
Wara wiri wirausahaWara wiri wirausaha
Wara wiri wirausaha
 
Ingreso y utilización de los alimentos en el sistema digestivo.
 Ingreso y utilización de los alimentos en el sistema digestivo. Ingreso y utilización de los alimentos en el sistema digestivo.
Ingreso y utilización de los alimentos en el sistema digestivo.
 
Portada sociais idade media difinitivo
Portada sociais idade media difinitivoPortada sociais idade media difinitivo
Portada sociais idade media difinitivo
 
Makalah pompa
Makalah pompaMakalah pompa
Makalah pompa
 
Abat Combi-Steamers - Russian Brochure
Abat Combi-Steamers - Russian BrochureAbat Combi-Steamers - Russian Brochure
Abat Combi-Steamers - Russian Brochure
 
TGPS NOV 2014 (2)
TGPS NOV 2014 (2)TGPS NOV 2014 (2)
TGPS NOV 2014 (2)
 
Biodisponibilidad de los nutrimentos
Biodisponibilidad de los nutrimentosBiodisponibilidad de los nutrimentos
Biodisponibilidad de los nutrimentos
 
Evaluacion del estado de nutricion
Evaluacion del estado de nutricionEvaluacion del estado de nutricion
Evaluacion del estado de nutricion
 
Space 134 - Caltrans Transportation Concept Report (2012)
Space 134 - Caltrans Transportation Concept Report (2012)Space 134 - Caltrans Transportation Concept Report (2012)
Space 134 - Caltrans Transportation Concept Report (2012)
 
3.пирожки жареные с творогом
3.пирожки жареные с творогом3.пирожки жареные с творогом
3.пирожки жареные с творогом
 

Similar to reviewpaper

COMPARISON OF RENDERING PROCESSES ON 3D MODEL
COMPARISON OF RENDERING PROCESSES ON 3D MODELCOMPARISON OF RENDERING PROCESSES ON 3D MODEL
COMPARISON OF RENDERING PROCESSES ON 3D MODELijcsit
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...csandit
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...cscpconf
 
Introduction_computer_graphics_unit-1.pptx
Introduction_computer_graphics_unit-1.pptxIntroduction_computer_graphics_unit-1.pptx
Introduction_computer_graphics_unit-1.pptxshivanipuran1
 
Nicolae_Denut_Theses
Nicolae_Denut_ThesesNicolae_Denut_Theses
Nicolae_Denut_ThesesNicolae Denut
 
Introduction to Computer Graphics
Introduction to Computer GraphicsIntroduction to Computer Graphics
Introduction to Computer GraphicsAbdullah Khan
 
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...Alexander Decker
 
3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compressionAlexander Decker
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 
Bl32821831
Bl32821831Bl32821831
Bl32821831IJMER
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET Journal
 
[White paper] Maintain-Accurate-Network-Diagrams
[White paper] Maintain-Accurate-Network-Diagrams[White paper] Maintain-Accurate-Network-Diagrams
[White paper] Maintain-Accurate-Network-DiagramsNetBrain Technologies
 
Blur Detection Methods for Digital Images-A Survey
Blur Detection Methods for Digital Images-A SurveyBlur Detection Methods for Digital Images-A Survey
Blur Detection Methods for Digital Images-A SurveyEditor IJCATR
 

Similar to reviewpaper (20)

COMPARISON OF RENDERING PROCESSES ON 3D MODEL
COMPARISON OF RENDERING PROCESSES ON 3D MODELCOMPARISON OF RENDERING PROCESSES ON 3D MODEL
COMPARISON OF RENDERING PROCESSES ON 3D MODEL
 
Comparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D ModelComparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D Model
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
 
ei2106-submit-opt-415
ei2106-submit-opt-415ei2106-submit-opt-415
ei2106-submit-opt-415
 
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
 
Comparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D ModelComparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D Model
 
Introduction_computer_graphics_unit-1.pptx
Introduction_computer_graphics_unit-1.pptxIntroduction_computer_graphics_unit-1.pptx
Introduction_computer_graphics_unit-1.pptx
 
Nicolae_Denut_Theses
Nicolae_Denut_ThesesNicolae_Denut_Theses
Nicolae_Denut_Theses
 
Introduction to Computer Graphics
Introduction to Computer GraphicsIntroduction to Computer Graphics
Introduction to Computer Graphics
 
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
11.0003www.iiste.org call for paper_d_discrete cosine transform for image com...
 
3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression3 d discrete cosine transform for image compression
3 d discrete cosine transform for image compression
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Bl32821831
Bl32821831Bl32821831
Bl32821831
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
 
Ha4 constraints
Ha4   constraintsHa4   constraints
Ha4 constraints
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
 
[White paper] Maintain-Accurate-Network-Diagrams
[White paper] Maintain-Accurate-Network-Diagrams[White paper] Maintain-Accurate-Network-Diagrams
[White paper] Maintain-Accurate-Network-Diagrams
 
Blur Detection Methods for Digital Images-A Survey
Blur Detection Methods for Digital Images-A SurveyBlur Detection Methods for Digital Images-A Survey
Blur Detection Methods for Digital Images-A Survey
 

reviewpaper

  • 1. Out-of-core rendering of massive data sets Review Paper Kurt Portelli Department of Computer Science Faculty of Information & Communications Technology University of Malta kpor0005@um.edu.mt Mr. Sandro Spina (Supervisor) Department of Computer Science Faculty of Information & Communications Technology University of Malta sandro.spina@um.edu.mt Mr. Keith Bugeja (Co-Supervisor) Department of Computer Science Faculty of Information & Communications Technology University of Malta keith.bugeja@um.edu.mt ABSTRACT With the advancement in computer-aided design technolo- gies and 3D capturing devices, complex realistic geometric models can be created that can contain millions of trian- gles. Examples of such models include architectural build- ings, complex CAD structures and 3D scans of heritage sites. These 3D models pose a number of challenges to current graphic systems when used for interactive display and ma- nipulation. In this work we make use of out-of-core tech- niques in order to enable rendering of massive data sets. Our rendering framework makes use of ray-casting and an acceleration data structure. The algorithm renders models without the need to load them entirely into memory. Mod- els are first added into the framework and organized onto a tree-based acceleration data structure with the leaves con- taining the models’ data. Depending on memory availabil- ity, two caches are created inside the main memory, one for the acceleration structure and another for the data stored at the leaf nodes. Various replacement policy heuristics are tested. Each heuristic has a different policy on how the data is managed and replaced inside the caches to determine the best approach in different scenes. Figure 7 found in sec- tion 4 shows a series of rendered images by our system of the power plant model downloaded from [13]. This model contains about 12 million triangles and each 512*512 im- age is rendered in about 21 seconds using the LRU cache replacement policy. 1. INTRODUCTION Designers and 3D artists often are able to create large and complex 3D environments using the tools available nowa- days. It is not uncommon to generate gigabyte-sized data sets such as power plants, ships and air planes. These mod- els are then used for simulations and design reviews[2]. Dis- playing such models has been a major challenge especially when this needs to be done in an interactive environment. (a) A highly detailed model taken from [2]. (b) A low detailed model taken from [9]. Figure 1 In order to visualise these 3D models, data representing ge- ometry and material information is first loaded inside the main memory. However if the size of the model is greater than the size of the memory, out-of-core algorithms must be used in order to display the model. Out-of-core algorithms are algorithms designed to process data that is larger than the memory available. Some of the algorithms which help alleviate the problem and accelerate the process to generate each frame are model simplification [15], visibility culling [8], level of detail [5], image or sample approximations [4] and occluder databases [6]. To be able to keep up with the increasing complexity of the 3D models, more advanced algorithms and better hardware acceleration are being developed continuously. Every little performance boost gained is spent into rendering more geo- metric surface detail (more model primitives). One can eas- ily see the difference between the model in figure 1a and 1b. Figure 1a has much more detail due to its higher primitive count but as a downside consumes more memory resources and takes more time to render. This is due to the fact that the model in figure 1a has millions of triangle primitives while 1b has only a few hundred. In general, since many operations are applied on the model’s set of primitives (eg. lighting, movement), an increase in the number of primitives implies an increase in the computational time required. The rasterization method is the most popular method for in- teractive rendering, mainly using OpenGL and DirectX. It takes the model’s 3D geometric description and transforms it into 2 dimensional points [12] inside the view frame, and has been implemented very efficiently inside hardware. This hardware is called the graphics card (or Graphics Process- ing Unit) and with time their performance has increased al- lowing us to display even larger and more complex models.
  • 2. The main challenge in this method is to determine visibility, which means to identify the regions of surfaces in a scene that are visible from the virtual camera. This requires a lot of computational power and data parallelism is used to attain a high frame rate. The increasing demand for ren- dering realistic images means that such systems will need to handle very complex geometry. Instead, approximations are sometimes used to simplify the complexity of such geometry while still rendering realistic images. These approximations however still lack in detail when compared to high detail geometry.E.Luong [7] argues that bumpy or displaced sur- faces cannot be reproduced faithfully using approximations especially as the bar for image quality rises. To render bumpy surfaces one can either render a com- plex geometric surface or instead use a bump mapping tech- nique[12]. Bump mapping is an example of such approxima- tions since it would contain a flat surface but then during lighting calculations its surface normals would be modified to fake depth. When there’s a need to render highly detailed models, these approximations might not generate the expected outcome. Thus, in order to model highly complex and detailed objects, a huge number of geometric primitives (e.g. triangles) are used to define the model’s surface geometry. A model that contains millions of geometric primitives will take up a lot of memory and will not fit inside memory and would slow the whole process of rasterization down. A straight forward solution would be to increase the memory of the graphic cards but there are other ways to tackle this problem and these are discussed in this paper. The aim of this work is to develop an out-of-core rendering framework capable of loading massive models with limited memory available. This aim is further split into these mile- stones: • Create a rendering system based on ray casting. This ray casting system uses an acceleration data structure to minimize the ray intersection search space, thus ren- dering at a higher frame rate. • The rendering system is extended to hold massive data sets that are larger than the available memory so as to use out-of-core rendering techniques. • Design and evaluate the different cache replacement policies within the system by simulating different sce- narios and putting constraints on resources. 2. BACKGROUND We now describe techniques and algorithms which are useful in order for the reader to better understand the rest of the paper. The approach taken was to render the scene using ray casting and an acceleration structure to speed up ray intersection tests. The framework created is able to load only those parts of the acceleration structure that are being used inside the main memory. This enables the framework to render models that are greater in size than that of the main memory. The aim of a rendering method is to generate an image of a three dimensional scene and display it on screen, clearly Figure 2: Virtual Camera Definition Figure 3: Depiction of a ray cast taken from [3]. we first need to define a virtual camera. The virtual cam- era will be the user’s eyes inside this 3D space. A camera as shown in figure 2 is created as a point in 3D space with a direction. Apart from that, the vectors indicating which way is up and right are also needed which help us calcu- late the view plane. The camera direction vector signifies where the camera is pointing. The camera position and the other 3 vectors make up the whole camera. When the scene geometry is not changing as in our case, animation of a scene occurs by updating these camera vectors.Two func- tions were added, one which lets the camera move forward and the other which rotates the camera around the origin. While rendering, a frame buffer is used to hold the contents of all the pixels inside the particular frame. When all the pixels are rendered, the system is ready to display the image and then replace the contents of the frame buffer with the contents of the next frame. This cycle continues until all the required frames are rendered. 2.1 Rendering method The use of ray casting for rendering 3d scenes was intro- duced in 1968 by Arthur Appel[1]. Rays are generated from the virtual camera and shot through each pixel of the view plane. The closest object is determined and the color of the ray is calculated through the object’s properties. Appel also hinted the use of shadow rays to determine shadows by shooting rays to the light sources. Ultimately this tells whether the light source is occluded or not and be able to influence the shading of the primary rays. Ray casting simply returns the triangle hit from the primary ray. More advanced designs would include taking into con-
  • 3. Figure 4: Ray - Triangle Intersection Figure 5: Shooting rays towards the pixels inside the image plane sideration light sources in the scene. A light source emits a ray of light as depicted in figure 3 which can be seen as a stream of photons travelling along a straight line (in a vacuum) and when it hits a surface these 3 things happen; absorption, reflection, refraction. In ray casting the process stops when an object is hit as shown in figure 4 while in ray tracing, all the reflected, and refracted rays are followed. Thus, ray tracing is much more computationally intensive[11] due to the fact that there are more rays to calculate and keep track of. The number of rays shot into the 3D scene is determined by the width and height of the image we want to generate. Mul- tiplied together would give us the total number of pixels the image will have, or rather the total number of rays we need to shoot towards the scene to generate a complete frame. A 1920 * 1080 (width * height) resolution would result in a total of about 2 million rays. For each pixel in the gener- ated image a ray is created. The parameters of the ray are computed by subtracting the camera position and 3d pixel location to get the ray direction as shown in figure 5. For each ray the last step is to check for its closest intersection, if any, with the 3D model. When all the rays are processed, color can be given to the image by checking the distance between the camera position and the hit location for each pixel, effectively computing a depth map. Models measuring in GBs will contain millions of primitives so checking for the closest intersection with all the model’s primitives will be too slow, thus an acceleration structure is used. The acceleration structure will allow us to search inside the model in logarithmic time, thus rendering each frame faster. 2.2 Acceleration Data structure An acceleration data structure is a data container that lets us retrieve the data we actually need on average in loga- rithmic time. During traversal each ray needs to return the closest intersection, so the speed of each ray intersection will determine the total time to render a frame. Storing all the model’s triangle primitives as a list will not be efficient and each frame will take a very long time to render since large models contain millions of triangles. The aim of an accelera- tion data structure is to accelerate this process by returning a small subset of the object’s triangles. The triangles re- turned should only be the ones that are close to the ray to reduce the total number of intersection checks. This will speed up the process of calculating each pixel and overall generate a frame in much less time. 2.3 Memory management An out-of-core rendering system requires some sort of mem- ory management as data is dynamically moved from sec- ondary storage to main memory. New data is also calculated during runtime and needs to be stored while in the mean time limit the memory usage to not run out of memory. Such a system needs to be able to handle large quantities of data that do not fit inside memory. A memory limit is enforced by the memory management sys- tem and then data is allocated accordingly, which include; triangles, vertices, acceleration data structure, camera prop- erties and frame buffers. For large models containing mil- lions of triangles which have been tested in this dissertation one cannot load everything inside the memory. The mem- ory management system allows for the dynamic allocation of data so that unused data can be swapped with the data currently needed from secondary storage. Page faults oc- cur when the system queries for data that is currently not in memory so the system needs to be able to load data on demand during runtime as they happen. Memory mapped files are used to load and save data inside the secondary stor- age and are considered as virtual memory by the out-of-core rendering framework. Each page fault decreases performance since retrieving data from secondary storage is very time consuming. Thus the system tries to minimize the number of page faults by us- ing efficient cache replacement policies. The memory is considered as a cache for all the model’s data and only the data that is being used and considered important by the replacement policies is kept inside memory. When the cache(memory allocated) is empty, the data needed is stored immediately. When memory is full and new data needs to be saved, depending on replacement policies some of the old data would be deleted. Part of the evaluation of this disser- tation is to evaluate multiple cache policies and determine the best one for a particular scene. 2.4 Parallelism The out-of-core rendering framework can also utilize multi threading in different areas of the system to gain in perfor- mance. Ray casting as explained before is the process of shooting a ray for each pixel. Thus each pixel can be ren- dered independently in a different thread. The rendering process and memory management can also be split into dif- ferent threads so that memory management does not hold
  • 4. back the rendering process when a page fault occurs. This will be explained in detail in section 3.2.1. Although splitting the process in different threads has the possibility of increasing performance, this must be done with care since each thread would want to use shared memory. Threads can change the content of some of the shared data and different versions of this data may exist creating in- consistent and bad results. Race conditions occur when thread ordering during execution would affect the outcome. To solve this problem thread synchronization is used which means that each thread would wait for other threads to fin- ish using a shared resource. This will ensure that no thread would access shared data that is currently in use and as such generate correct results. When thread synchronization is implemented this can slow the process down since threads now must wait for each other during execution of the critical sections. Thus, the critical section (where shared resources are used) which is the sequential component must be concise and fast so that the other threads don’t spend a lot of time waiting. 3. DESIGN AND IMPLEMENTATION 3.1 kd-tree The aim of storing the primitives inside a structure is to be able to return only the closest primitives to the ray. This would implicitly reduce the number of ray - primitive inter- sections and speed up the whole process. It is important to note that although such a structure speeds up the ren- dering process it creates overhead in memory consumption. Thus each node should be as small as possible to minimize this overhead in space. The nodes must be stored in such a way that lets the system access them quickly and utilise the operating system’s cache mechanisms to the full. The accel- eration structure used was the kd-tree since in practice they have been shown to perform the best most of the time due to their simplicity and capability to adapt to scene complexity [16]. 3.1.1 Construction The kd-tree construction algorithm was implemented in a recursive way. Each time the algorithm is called it chooses a split plane and splits the triangles inside 2 inner nodes until a termination condition is met and a leaf node returned. The choice of the split plane determines the efficiency and size of the kd-tree. Construction takes a very long time, but this is only done once at the beginning and then used for each frame. 3.1.2 Traversal The traversal algorithm is called once for every pixel so it needs to be very efficient and optimized. Kd-trees offer very simple traversal code which lets us optimize it even more. Ingo Wald’s traversal code is in fact very optimized and he describes it in pseudo code form in Realtime Ray Tracing and Interactive Global Illumination[16]. Due to the fact that his implementation is already tested and confirmed that it is one of the best implementations as seen by the results he published we decided to base most of our traversal algorithm on his work. When shooting a ray down a kd-tree we only need to make binary choices; right or left node. Figure 6: 3 different traversal scenarios Traversing left or right is determined depending on how the ray hits the split plane as shown in figure 6 by the 3 differ- ent scenarios. This method also lets us using culling tech- niques as we are able to determine which section of the split plane is hit first, thus skipping part of the tree. This im- plementation is very efficient since all computation is done using ray segments, which are all 1-Dimensional computa- tions[16]. The actual 3D coordinates of the entry, exit and intersection points are not even computed and this greatly improves performance. 3.1.3 Triangle Intersection This code is ran thousands of times per second so it needs to be very optimized and efficient. The algorithm used is the one designed by Moller and Trumbore in [14]. A ray R(t) is defined as: R(t) = O +tD . O being the origin and D the normalized direction vector. On the other hand a triangle is defined by 3 Vertices V0 , V1, v2. Our algorithm needs to find t where the ray intersects with the triangle’s plane and then check if the intersection is within the 3 vertices. Using barycentric coordinates[10] is the fastest known way to determine whether an intersection is within the triangle and what the exact coordinates are. In our implementa- tion the only thing we needed was the distance from to the intersection to the camera position so the color could be calculated but the exact 3D location of the intersection was not needed. The idea behind barycentric coordinates can be seen as placing weights on top of the Triangle’s vertices and then these weights would determine a point inside or outside the triangle. 3.2 System Cache A fixed size for the amount of nodes and the Leaf data stored inside the memory had to be set so that the system can be ran on different operating system configurations. By leaf data we mean the list of triangles contained inside a leaf node. Taking all the leaf nodes’ lists and storing them as a whole list takes up a lot of space even Gigabytes in size. Thus a limit was set on the amount of space that will be used to store the nodes and the list of triangles.Two separate caches were created each with their own cache replacement policies which can also be set to different sizes. Two other very important separate arrays are contained in memory the Triangles and Vertices of the 3D Model currently loaded. These two arrays are constantly being accessed and need to offer the best possible access time, thus they are stored as an array and always kept inside the memory. 3.2.1 Loading
  • 5. Frame Decaying Frequency FIFO Frequency LRU 1 3.4 3.8 3 3.8 3 7.1 10.9 6.4 11.1 6 6.7 11.1 6.1 11.4 9 6.1 10.5 5.5 10.7 12 5.7 10.2 5.2 10.2 15 5.2 9.5 4.7 9.6 Table 1: Ray Casts per second (in thousands) for 256MB cache Frame Decaying Frequency FIFO Frequency LRU 1 4.4 4.6 4.4 4.6 3 10.9 11.1 11.2 11.5 6 11.1 12 11.1 12 9 10.5 11.6 10.5 11.5 12 10.1 11.2 10.1 11.3 15 9.6 11 9.4 11.1 Table 2: Ray Casts per second (in thousands) for 768MB cache Both caches work in a very similar way since the methods exposed are very similar. These include checking if a node is inside the cache and add a node to the cache. These methods hide all the complications that occur when a node is not inside the cache and needs to be retrieved from sec- ondary storage. The system offers two loading methods, synchronously and asynchronously. When using the syn- chronous method, the system waits for the cache to load the node when a cache miss occurs. The asynchronous method queues the current pixel and continues rendering other pixels until the data needed is loaded from secondary storage 3.2.2 Configuration The cache was designed in such a way to be as extensi- ble as possible so that multiple cache replacement policies are implemented and then tested.The different cache imple- mentations which were implemented are First in First Out (FIFO), Last recently used (LRU), Decaying Frequency and Frequency. All the caches implemented use set associativity as their mapping technique since it gives enough flexibility to implement replacement policies. 4. EVALUATION AND RESULTS To evaluate the system, multiple tests were carried out us- ing different models and cache configurations whilst chang- ing the amount of memory the system can use. The same camera path is used throughout all these tests for a consis- tent evaluation. A 512 * 512 frame is rendered for each two degrees. The path is set to create 15 frames thus rotating a total angle of 30 degrees. Two sets of results are generated; the first set uses only one thread for rendering and load- ing missing data while the second set uses 2 threads, one for loading and the other for rendering the scene. The aim of these tests is to determine the best technique and cache replacement policy for this particular camera path. Table 1 and 2 show the rays per second the system computes (in thousands). The fastest frame was rendered at around 12k rays per second, which means that the frame was gener- ated in about 21 seconds. Loading the whole model with the Frame LRU Asynch LRU Synch 1 4.3 4.6 3 11.3 11.5 6 11.6 12 9 11.3 11.5 12 11.1 11.3 15 10.8 11.1 Table 3: Ray Casts per second (in thousands) for 768MB cache acceleration data structure would take around 4.6 GB. The tests carried out loaded all the triangles and vertices and then used the additional two caches, and in each test the cache size varied. In the first test the limit was set at 256 MB and in the other test set at 768 MB. Thus, in the first test we were only using (((256MB*2)+200MB)/4410MB) *100) = 16% of the memory needed originally if no out-of-core techniques were used. Note that the extra 200 MB we are adding to the space consumed by the caches, is the space consumed by the triangles and vertices loaded into memory. In the second test scenario we were loading 39% of the total data into memory so better performance was obtained as can be seen in table 2. Less cache misses occurred as the size of the cache was increased since more data was kept in memory. The first thing one notices when seeing tables 1 and 2 is the performance difference between the first frame and the rest. This is because during the first frame data is still being loaded into memory since the leaf data cache is empty. The cache replacement policies that gave the best results were FIFO and LRU. An interesting observation is that when the cache size was very small compared to the 3D model, Fre- quency and Decaying Frequency replacement policies cre- ated many cache misses and in fact were the worst perform- ers. As the cache size increased they had a major perfor- mance boost and caught up with the rest of the replacement policies. Table 3 shows the rays per second (in thousands) when running the system for the power plant model using the LRU cache both when loading data synchronously and asyn- chronously. The size is also changed and one can easily see the performance boost when the size is increased. Analysing the tables one can notice that loading the data asynchronously there is no performance boost in most cases even though the- oretically it should improve since the CPU would be spend- ing less idle time. The major reason no improvement is seen is because of the additional complexity the system has to be able to load the data asynchronously and still generate cor- rect results. With the current implementation when loading asynchronously more cache misses are occurring since data is being replaced without having been used and then queued again for loading. 5. CONCLUSION We have presented an out-of-core rendering system which renders massive data sets and lets the user navigate through the 3D space using a virtual camera. The system uses ray casting as a rendering method and an acceleration data structure to increase the frame rate. The acceleration data
  • 6. Figure 7: Power Plant images structure used is the kd-tree. Depending on the amount of memory available, our rendering framework loads, on de- mand, different parts of the structure and model’s data. The system’s main memory is used as a temporary cache where the data required to render a frame is loaded. A number of cache replacement policies have been implemented, which try to minimise the amount of cache swaps at runtime be- tween successive frames. The framework also has the option of spawning an extra thread which does the loading from secondary storage separately with the aim of improving ren- dering speeds. From the results obtained, the cache replacement policy that generated the least amount of cache misses and rendered the models in the least amount of time is the Least Recently Used policy(LRU). The FIFO replacement policy generated very positive results too and achieved nearly the same per- formance as LRU. Increasing the cache size will also improve performance as less cache misses occur. 5.1 Future Work As can be seen from the results the frame rate achieved is not interactive so future improvement should focus on increasing the rendering performance. Many factors affect the rendering performance such as cache size, cache policy, cache implementation, number of kd-tree nodes, number of triangles inside leaf nodes, code optimization, traversal, and intersection test method. The bottle neck of the system is currently the kd-tree since the system generates large kd- trees with some of the leaf nodes containing a large amount of triangles. Optimally the leaf nodes contain few triangles so that less ray-triangle intersections are performed during runtime, thus a large amount of triangles slow the system down. The biggest potential would be to improve the choice of the splitting plane and termination criteria when building the kd-tree. There was no improvement when loading data asynchronously as seen in previous results, so better thread synchronization techniques to minimize threads waiting for each other would definitely improve the performance. Wald mentions the use of ray packets as a more efficient traver- sal mechanism to improve performance. Currently one ray is traversed at a time but Wald’s method in [16] traverses groups of rays at a time. The main idea behind this is that rays close to each other will most likely have very similar traversal paths, thus most of the data needed will already be inside the CPU’s cache and that would result in a major performance boost. 6. REFERENCES [1] Arthur Appel. Some techniques for shading machine renderings of solids. In Proceedings of the April 30–May 2, 1968, spring joint computer conference, AFIPS ’68 (Spring), pages 37–45, New York, NY, USA, 1968. ACM. [2] William V. Baxter, III, Avneesh Sud, Naga K. Govindaraju, and Dinesh Manocha. Gigawalk: interactive walkthrough of complex environments. In Proceedings of the 13th Eurographics workshop on Rendering, EGRW ’02, pages 203–214, Aire-la-Ville, Switzerland, Switzerland, 2002. Eurographics Association. [3] Jamis Buck. The recursive ray tracing algorithm. http://reocities.com/SiliconValley/haven/5114/ raytracing.html. [Online] accessed 10-April-2013. [4] Robert L. Cook. Stochastic sampling in computer graphics. ACM Trans. Graph., 5(1):51–72, January 1986. [5] Jonathan D. Cohen Amitabh Varshney Benjamin Watson Robert Huebner David Luebke, Martin Reddy. Level of Detail for 3D Graphics. Morgan Kaufmann Publishers, 2003. [6] Tomas Moller Eric Haines. Occlusion culling algorithms. http://www.gamasutra.com/view/feature/131801/ occlusion_culling_algorithms.php?page=3, 2009. [Online] accessed 24-May-2013. [7] Kayvon Fatahalian, Edward Luong, Solomon Boulos, Kurt Akeley, William R. Mark, and Pat Hanrahan. Data-parallel rasterization of micropolygons with defocus and motion blur. In Proceedings of the Conference on High Performance Graphics 2009, HPG ’09, pages 59–68, New York, NY, USA, 2009. ACM. [8] Markus Hadwiger and Andreas Varga. Visibility culling, 1999. [9] HellSkyˆaˇD´c. 3d model of low poly fighter aircraft. http://hellsky.com/index.php/1316/ 3d-model-of-low-poly-fighter-aircraft/. [Online] accessed 24-May-2013. [10] Lighthouse3D. Ray-triangle intersection. http://www.lighthouse3d.com/tutorials/maths/ ray-triangle-intersection/. [Online] accessed 1-May-2013. [11] Jonathan Macey. Ray-tracing and other rendering approaches. http://nccastaff.bournemouth.ac.uk/ jmacey/CGF/slides/RayTracing4up.pdf. [Online] accessed 30-April-2013. [12] Greg Humphreys Matt Phar. Physically Based Rendering From Theory to Implementation Second Edition. Morgan Kaufmann Publishers, 2010. [13] Prof. Morgan McGuire. Computational graphics. http://graphics.cs.williams.edu/, 2009. [Online] accessed 1-May-2013. [14] Tomas M¨oller and Ben Trumbore. Fast, minimum storage ray-triangle intersection. J. Graph. Tools, 2(1):21–28, October 1997. [15] Jarek Rossignac. Geometric simplification and compression. 1997. [16] Ingo Wald. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Computer Graphics Group, Saarland University, 2004.